Loading...
Loading...
Complete reference for the CostLens SDK - Save 50-80% on AI costs automatically.
💰 Money-Saving Features
Redis caching, quality monitoring, AI optimization, and real-time savings tracking
⚠️ Server-Side Only
Caching and optimization require server-side environment (Node.js, Next.js API routes). Browser usage works but skips these features for security.
npm install costlens
new CostLens(config: CostLensConfig)
Name | Type | Required | Description |
---|---|---|---|
apiKey | string | Yes | Your CostLens API key |
autoOptimize | boolean | No | 💰 Auto-optimize prompts (50-80% token reduction) |
smartRouting | boolean | No | 💰 Route to cheapest model (20x savings) |
enableCache | boolean | No | 💰 Cache responses (80% savings on repeats) |
costLimit | number | No | 💰 Max cost per request (prevents overruns) |
autoFallback | boolean | No | Auto-fallback on rate limits |
maxRetries | number | No | Max retry attempts (default: 3) |
baseUrl | string | No | Custom base URL (default: https://api.costlens.dev) |
routingPolicy | function | No | 🆕 Custom routing decisions |
qualityValidator | function | No | 🆕 Custom quality scoring |
requestId | string | No | 🆕 Request tracking ID |
correlationId | string | No | 🆕 Correlation tracking ID |
const costlens = new CostLens({
apiKey: 'cl_your_api_key_here',
autoOptimize: true, // 💰 Save 50-80% on tokens
smartRouting: true, // 💰 Route to cheapest model
enableCache: true, // 💰 Cache repeated queries
costLimit: 0.10, // 💰 Max $0.10 per request
autoFallback: true, // Auto-retry on failures
maxRetries: 3, // Retry up to 3 times
// 🆕 New SDK Features
routingPolicy: (requestedModel, messages) => {
// Custom routing logic
if (messages.length > 10) return 'gpt-4o-mini';
return requestedModel;
},
qualityValidator: (responseText, messagesJson) => {
// Custom quality scoring (1-5)
return responseText.length > 100 ? 5 : 3;
},
requestId: 'req_' + Date.now(),
correlationId: 'session_abc123'
});
Wrap your provider clients so CostLens can route, cache and track usage automatically.
import OpenAI from 'openai';
import CostLens from 'costlens';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const costlens = new CostLens({ apiKey: process.env.COSTLENS_API_KEY, smartRouting: true });
const tracked = costlens.wrapOpenAI(openai);
const res = await tracked.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello' }]
});
import Anthropic from '@anthropic-ai/sdk';
import CostLens from 'costlens';
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const costlens = new CostLens({ apiKey: process.env.COSTLENS_API_KEY, smartRouting: true });
const trackedClaude = costlens.wrapAnthropic(anthropic);
const res = await trackedClaude.messages.create({
model: 'claude-3-haiku',
messages: [{ role: 'user', content: 'Hello' }]
});
Return a helpful message and optionally record the failure for visibility.
try {
const start = Date.now();
const result = await tracked.chat.completions.create(params);
await costlens.trackOpenAI(params, result, Date.now() - start, 'prompt-42');
return result;
} catch (err) {
await costlens.trackError('openai', params.model as string, JSON.stringify(params.messages), err as Error, 0);
throw err; // surface to caller
}
Track an OpenAI API call.
trackOpenAI(
params: OpenAI.Chat.ChatCompletionCreateParams,
result: OpenAI.Chat.ChatCompletion,
latency: number,
promptId?: string
): Promise<void>
params
- The parameters passed to OpenAIresult
- The response from OpenAIlatency
- Time taken in millisecondspromptId
- Optional tag to group related promptsconst start = Date.now();
const result = await openai.chat.completions.create(params);
await costlens.trackOpenAI(
params,
result,
Date.now() - start,
'my-prompt-v1' // optional
);
Track an Anthropic (Claude) API call.
trackAnthropic(
params: Anthropic.MessageCreateParams,
result: Anthropic.Message,
latency: number,
promptId?: string
): Promise<void>
params
- The parameters passed to Anthropicresult
- The response from Anthropiclatency
- Time taken in millisecondspromptId
- Optional tag to group related promptsconst start = Date.now();
const result = await anthropic.messages.create(params);
await costlens.trackAnthropic(
params,
result,
Date.now() - start,
'my-prompt-v1' // optional
);
Track a Google Gemini API call.
trackGemini(
params: any,
result: any,
latency: number,
promptId?: string
): Promise<void>
params
- The parameters passed to Geminiresult
- The response from Geminilatency
- Time taken in millisecondspromptId
- Optional tag to group related promptsconst start = Date.now();
const result = await model.generateContent(params);
await costlens.trackGemini(
{ model: 'gemini-pro', ...params },
result.response,
Date.now() - start,
'my-prompt-v1' // optional
);
Track an xAI Grok API call.
trackGrok(
params: any,
result: any,
latency: number,
promptId?: string
): Promise<void>
params
- The parameters passed to Grokresult
- The response from Groklatency
- Time taken in millisecondspromptId
- Optional tag to group related promptsconst start = Date.now();
const result = await grok.chat.completions.create(params);
await costlens.trackGrok(
params,
result,
Date.now() - start,
'my-prompt-v1' // optional
);
Track a failed API call.
trackError(
provider: string,
model: string,
input: string,
error: Error,
latency: number
): Promise<void>
provider
- The provider (openai, anthropic, gemini, grok)model
- The model that was attemptedinput
- The input that was senterror
- The error objectlatency
- Time taken before failuretry {
const result = await openai.chat.completions.create(params);
await costlens.trackOpenAI(params, result, latency);
} catch (error) {
await costlens.trackError(
'openai',
params.model,
JSON.stringify(params.messages),
error,
latency
);
throw error;
}
Override default routing decisions with custom logic based on request context.
const costlens = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
routingPolicy: (requestedModel, messages) => {
// Route complex queries to better models
const complexity = messages.reduce((acc, msg) => acc + msg.content.length, 0);
if (complexity > 1000) {
return 'gpt-4o'; // Use premium model for complex tasks
}
if (requestedModel === 'gpt-4' && complexity < 100) {
return 'gpt-4o-mini'; // Downgrade simple tasks
}
return requestedModel; // Keep original choice
}
});
Implement custom quality scoring to improve routing decisions over time.
const costlens = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
qualityValidator: (responseText, messagesJson) => {
const messages = JSON.parse(messagesJson);
// Score based on response completeness
let score = 3; // baseline
if (responseText.length > 200) score += 1;
if (responseText.includes('```')) score += 1; // code examples
if (messages.some(m => m.content.includes('?')) &&
responseText.includes('?')) score -= 1; // answered with question
return Math.max(1, Math.min(5, score)); // clamp 1-5
}
});
Track related requests across your application with correlation IDs.
// Track user session
const sessionId = 'session_' + userId;
const requestId = 'req_' + Date.now();
const costlens = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
requestId: requestId,
correlationId: sessionId
});
const tracked = costlens.wrapOpenAI(openai);
// All requests will be tagged with these IDs
await tracked.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello' }]
});
// Query analytics by correlation ID
const analytics = await fetch(`/api/analytics?correlationId=${sessionId}`);
Automatically cache responses to save money on repeated requests. Achieves 60-80% hit rates in production.
const costlens = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
enableCache: true, // Enable Redis caching
});
const tracked = costlens.wrapOpenAI(openai);
// First call - cache miss, costs $0.05
await tracked.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'What is 2+2?' }],
});
// Second call - cache hit, costs $0.00!
await tracked.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'What is 2+2?' }],
});
Smart routing automatically disables if response quality drops below 3.5/5 stars.
// SDK checks quality status before routing
const tracked = costlens.wrapOpenAI(openai);
// If quality is good: GPT-4 → GPT-3.5 (saves money)
// If quality dropped: Uses GPT-4 (protects quality)
await tracked.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Complex task...' }],
});
// Submit feedback to improve routing
await fetch('/api/quality/feedback', {
method: 'POST',
body: JSON.stringify({
runId: 'run_123',
rating: 5, // 1-5 stars
feedback: 'Great response!'
})
});
Automatically compress prompts by 30-50% while preserving meaning.
const costlens = new CostLens({
apiKey: process.env.COSTLENS_API_KEY,
autoOptimize: true, // Enable AI compression
});
const tracked = costlens.wrapOpenAI(openai);
// Original: 200 tokens
// Optimized: 100 tokens (50% reduction)
// Savings: $0.009 per request
await tracked.chat.completions.create({
model: 'gpt-4',
messages: [{
role: 'user',
content: 'Please kindly help me understand what the weather will be like tomorrow in San Francisco, California, USA'
}],
});
// Compressed to: "Weather forecast for San Francisco tomorrow?"
Track exactly how much money you're saving with baseline cost comparison.
// View savings in dashboard
const response = await fetch('/api/savings?period=today');
const savings = await response.json();
console.log(`Saved today: $${savings.totalSaved}`);
console.log(`Breakdown:`);
console.log(` Smart Routing: $${savings.smartRouting}`);
console.log(` Caching: $${savings.caching}`);
console.log(` Optimization: $${savings.optimization}`);
console.log(`Savings Rate: ${savings.savingsRate}%`);
// Example output:
// Saved today: $45.67
// Breakdown:
// Smart Routing: $30.00
// Caching: $12.50
// Optimization: $3.17
// Savings Rate: 68%
interface CostLensConfig {
apiKey: string;
baseUrl?: string;
}
# .env
COSTLENS_API_KEY=cl_your_api_key_here
OPENAI_API_KEY=sk-your_openai_key
ANTHROPIC_API_KEY=sk-ant-your_anthropic_key