Close Menu
World Forbes – Business, Tech, AI & Global Insights
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
What's Hot

Liberate and Snapsheet partner to accelerate AI-powered claims automation for insurers

May 9, 2025

This is your last chance to exhibit at TechCrunch Sessions: AI — don’t miss out

May 9, 2025

Valuable Information Leaked in LockBit Ransomware Hack 

May 9, 2025
Facebook X (Twitter) Instagram
Trending
  • Liberate and Snapsheet partner to accelerate AI-powered claims automation for insurers
  • This is your last chance to exhibit at TechCrunch Sessions: AI — don’t miss out
  • Valuable Information Leaked in LockBit Ransomware Hack 
  • IPL match in Dharamsala called off midway – Sport
  • Swiatek makes fast start at Italian Open – Sport
  • Challengers, Strikers win in National Women’s T20 – Sport
  • PSL to be held in Dubai as tensions escalate – Sport
  • Freed Palestinian student accuses Columbia University of inciting violence
World Forbes – Business, Tech, AI & Global InsightsWorld Forbes – Business, Tech, AI & Global Insights
Friday, May 9
  • Home
  • AI
  • Billionaires
  • Business
  • Cybersecurity
  • Education
    • Innovation
  • Money
  • Small Business
  • Sports
  • Trump
World Forbes – Business, Tech, AI & Global Insights
Home » Google launches ‘implicit caching’ to make accessing its latest AI models cheaper
AI

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper

adminBy adminMay 8, 2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email
Post Views: 8


Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers.

Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.

That’s likely to be welcome news to developers as the cost of using frontier models continues to grow.

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢

We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!

— Logan Kilpatrick (@OfficialLoganK) May 8, 2025

Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. For example, caches can store answers to questions users often ask of a model, eliminating the need for the model to re-create answers to the same request.

Google previously offered model prompt caching, but only explicit prompt caching, meaning devs had to define their highest-frequency prompts. While cost savings were supposed to be guaranteed, explicit prompt caching typically involved a lot of manual work.

Some developers weren’t pleased with how Google’s explicit caching implementation worked for Gemini 2.5 Pro, which they said could cause surprisingly large API bills. Complaints reached a fever pitch in the past week, prompting the Gemini team to apologize and pledge to make changes.

In contrast to explicit caching, implicit caching is automatic. Enabled by default for Gemini 2.5 models, it passes on cost savings if a Gemini API request to a model hits a cache.

Techcrunch event

Berkeley, CA
|
June 5

BOOK NOW

“[W]hen you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit,” explained Google in a blog post. “We will dynamically pass cost savings back to you.”

The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings. Tokens are the raw bits of data models work with, with a thousand tokens equivalent to about 750 words.

Given that Google’s last claims of cost savings from caching ran afoul, there are some buyer-beware areas in this new feature. For one, Google recommends that developers keep repetitive context at the beginning of requests to increase the chances of implicit cache hits. Context that might change from request to request should be appended at the end, the company says.

For another, Google didn’t offer any third-party verification that the new implicit caching system would deliver the promised automatic savings. So we’ll have to see what early adopters say.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
admin
  • Website

Related Posts

This is your last chance to exhibit at TechCrunch Sessions: AI — don’t miss out

May 9, 2025

Microsoft employees are banned from using DeepSeek app, president says 

May 8, 2025

ChatGPT’s deep research tool gets a GitHub connector to answer questions about code

May 8, 2025

Google rolls out AI tools to protect Chrome users against scams

May 8, 2025

Exhibit your startup at TechCrunch Sessions: AI while you still can! 

May 8, 2025

Sequoia leads $1.5B tender offer for sales automation startup Clay

May 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Don't Miss
Billionaires

Skechers’ Greenbergs Set To Pocket Up To $1.1 Billion From Sale To 3G

May 6, 2025

Skechers founders Robert Greenberg (left) and Michael Greenberg (right) started the brand more than 30…

Trump Organization Admits President Still Controls His Business

May 6, 2025

Forbes Richest Person In Every State 2025

April 30, 2025

These Billionaire Signers Of The Giving Pledge Signers On Why The Philanthropy Group Still Matters

April 29, 2025
Our Picks

Liberate and Snapsheet partner to accelerate AI-powered claims automation for insurers

May 9, 2025

This is your last chance to exhibit at TechCrunch Sessions: AI — don’t miss out

May 9, 2025

Valuable Information Leaked in LockBit Ransomware Hack 

May 9, 2025

IPL match in Dharamsala called off midway – Sport

May 9, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to World-Forbes.com
At World-Forbes.com, we bring you the latest insights, trends, and analysis across various industries, empowering our readers with valuable knowledge. Our platform is dedicated to covering a wide range of topics, including sports, small business, business, technology, AI, cybersecurity, and lifestyle.

Our Picks

This is your last chance to exhibit at TechCrunch Sessions: AI — don’t miss out

May 9, 2025

Microsoft employees are banned from using DeepSeek app, president says 

May 8, 2025

ChatGPT’s deep research tool gets a GitHub connector to answer questions about code

May 8, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
© 2025 world-forbes. Designed by world-forbes.

Type above and press Enter to search. Press Esc to cancel.