GPT-4.1 Nano is $0.10/1M tokens. Prompt caching cuts 50%. Batch API halves async work. Here's the exact playbook to cut your OpenAI bill by 40-70%.

Your OpenAI bill is higher than it needs to be. Not because their pricing is unfair — but because most teams use expensive models for tasks that cheap models handle fine. Here's how to fix that.
Get AI pricing updates when models launch
Join 50+ engineering leaders. No spam.
OpenAI now has a 25x price range across models. Most teams default to GPT-4.1 ($2/$8) for everything when GPT-4.1 Nano ($0.10/$0.40) handles 60% of their requests.
Current OpenAI model pricing (June 2026):
| Model | Input/1M | Output/1M | Use Case |
|---|---|---|---|
| GPT-4.1 Nano | $0.10 | $0.40 | Classification, extraction, simple Q&A |
| GPT-4.1 mini | $0.40 | $1.60 | Summarization, moderate reasoning |
| GPT-4.1 | $2.00 | $8.00 | Code gen, complex instructions |
| GPT-5.4 | $2.50 | $10.00 | Flagship reasoning |
The rule: Start with the cheapest model. Only upgrade when it fails your quality bar.
Real impact:
Before: 100K requests/month on GPT-4.1
Cost: $1,000/month
After: 60K on Nano, 30K on mini, 10K on GPT-4.1
Cost: $34 + $60 + $100 = $194/month
Savings: 80%
OpenAI gives 50% off input tokens when the prefix matches a cached prompt. If your system prompt is 500 tokens and you send it 100K times/month — that's 50M tokens you're paying double for without caching.
How it works: OpenAI automatically caches prompts >1,024 tokens. Subsequent requests with the same prefix get 50% off input.
What to do:
Savings: 20-40% on input tokens for apps with consistent system prompts.
If you don't need real-time responses, the Batch API gives a flat 50% discount. Results return within 24 hours.
Good for:
import OpenAI from 'openai';
const openai = new OpenAI();
// Create batch file
const batch = await openai.batches.create({
input_file_id: 'file-...',
endpoint: '/v1/chat/completions',
completion_window: '24h'
});
// 50% cheaper than real-time. Same quality.
Savings: 50% on anything that doesn't need instant responses.
Every token costs money. Most prompts contain filler that doesn't improve output quality.
Before (180 tokens):
You are a helpful, professional assistant specializing in customer support.
Your goal is to provide clear, concise, and accurate responses to customer
inquiries. Please analyze the following customer message and determine the
appropriate category from this list: billing, technical, account, general.
Be thorough in your analysis and provide your answer in a clear format.
Customer message: "I can't log in to my account"
After (40 tokens):
Classify this customer message into one category: billing, technical, account, general.
Reply with ONLY the category name.
Message: "I can't log in to my account"
Same result. 78% fewer input tokens. And with a simpler prompt, you can use GPT-4.1 Nano instead of GPT-4.1.
Savings: 10-30% on input tokens + enables cheaper model selection.
You can't optimize what you don't measure. Most teams have no idea which features, endpoints, or prompts cost the most.
import { CostLens } from 'costlens';
import OpenAI from 'openai';
const costlens = new CostLens({ apiKey: 'cl_...' });
const openai = costlens.wrapOpenAI(new OpenAI());
// Every call now tracked with cost breakdown
const resp openai.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: '...' }],
});
// Terminal: ✓ gpt-4.1 | 340 tokens | $0.0034 | total today: $12.45
npm install costlens
Once you see where the money goes, the optimization becomes obvious. We've seen teams realize 70% of their spend comes from 2-3 prompts that could use a cheaper model.
These strategies compound:
| Strategy | Savings | Effort |
|---|---|---|
| Model routing | 40-80% | Medium (test cheaper models) |
| Prompt caching | 20-40% | Low (restructure prompt order) |
| Batch API | 50% | Low (move async work to batch) |
| Prompt shortening | 10-30% | Low (rewrite verbose prompts) |
| Cost tracking | Enables all above | 5 min (npm install costlens) |
Realistic combined: 40-70% reduction without quality loss.
npm install costlens
import { CostLens } from 'costlens';
import OpenAI from 'openai';
const costlens = new CostLens({ apiKey: 'cl_...' });
const openai = costlens.wrapOpenAI(new OpenAI());
Run for a day. Look at which calls cost the most.
Switch the expensive-but-simple calls to GPT-4.1 Nano or mini.
Enable batch for anything that doesn't need real-time.
That's it. Most teams see 40%+ savings within the first week.
Pricing from OpenAI API. June 2026.
Liked this analysis? We publish one deep-dive per week.
AI pricing, model benchmarks, and real cost data.
See what AI is actually costing your team
Real data from a real engineering team. No sign-up required.