Anthropic quietly added prompt caching in August 2024. It can cut your costs by 90%, but most developers don't know it exists.
In August 2024, Anthropic added a feature that can reduce your API costs by 90% for certain use cases.
It's called Prompt Caching, and almost nobody uses it.
Why? Because Anthropic barely marketed it, the documentation is buried, and most developers don't understand when it applies.
Let me fix that.
Get AI pricing updates when models launch
Join 50+ engineering leaders. No spam.
Prompt caching lets you reuse parts of your prompt across multiple requests without paying for those tokens again.
Think of it like this: if you're sending the same system prompt, documentation, or context repeatedly, you only pay for it once. Subsequent requests reuse the cached version at a 90% discount.
Normal pricing (Claude 3.5 Sonnet):
Cached input pricing:
If you're making multiple requests with the same context, the savings are massive.
Let's say you're building a code review bot. Each review includes:
Without caching (100 reviews):
With caching (100 reviews):
Savings: 64% for just 100 requests.
At scale (10,000 reviews), you save $85 per 10K requests.
Perfect for:
1. Code Review Bots
2. Customer Support
3. Document Analysis
4. RAG Applications
For a complete breakdown of RAG costs and optimization strategies, see our RAG pipeline cost guide.
5. Multi-turn Conversations
Anthropic's caching requires manual configuration:
For most teams, this complexity means they never implement it—leaving massive savings on the table.
CostLens handles prompt caching automatically:
import { CostLens } from 'costlens';
const client = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
enableCache: true,
});
// CostLens automatically identifies and caches repetitive content
const resp client.chat({
messages: [
{ role: 'system', content: 'You are a code review assistant...' },
{ role: 'user', content: 'Review this code: ...' }
],
});
No manual cache markers. No configuration. Just automatic savings.
Caches last for 5 minutes of inactivity.
If you make another request within 5 minutes, the cache is reused. After 5 minutes, it expires and you pay the write cost again.
For high-traffic applications, this means near-constant cache hits.
SaaS Company (Document Analysis):
Code Review Platform:
Customer Support Bot:
1. Identifying What to Cache
Most developers cache the wrong content—either too much (paying premiums for no benefit) or too little (missing savings).
2. Cache Ordering Requirements
Anthropic requires cached content at the start of prompts. Get the order wrong, and caching breaks silently.
3. Managing the 5-Minute Window
Cache expires after 5 minutes of inactivity. Batch jobs need careful timing to maximize hits.
4. Tracking ROI
Without proper monitoring, you don't know if caching is actually saving money.
CostLens solves all of these automatically with intelligent caching algorithms that learn from your usage patterns.
Manually tracking cache performance requires:
CostLens does this automatically. The dashboard shows:
You see exactly how much you're saving without writing monitoring code.
If you're using Claude for:
And you're NOT using prompt caching, you're overpaying by 60-90%.
It takes 5 minutes to implement and can save thousands monthly.
Start tracking your cache hit rates today. You'll wonder why you waited.
Liked this analysis? We publish one deep-dive per week.
AI pricing, model benchmarks, and real cost data.
See what AI is actually costing your team
Real data from a real engineering team. No sign-up required.