Docs/Cost Management/Smart Routing

Performance Optimizations

Reduce costs with smart routing, batch processing, and circuit breaker protection.

New Performance Features

Batch Processing: 3-5x faster with reduced HTTP overhead

Smart Caching: 90% faster authentication with 5-minute cache

Circuit Breaker: 99.9% uptime with automatic failure protection

Smart Routing: Automatic task detection and optimal model selection

Quick Start

Get started with performance optimizations in 5 minutes:

1. Install the SDK

npm install costlens

2. Enable Optimizations

import { CostLens } from 'costlens';

const costlens = new CostLens({
  apiKey: 'your-api-key',
  smartRouting: true,     // Auto-route to cheaper models
  enableCache: true,      // Cache repeated requests
  autoOptimize: true,     // Optimize prompts automatically
  costLimit: 0.10,        // Max $0.10 per request
});

3. Use Batch Processing

// Instead of individual requests:
for (const request of requests) {
  await costlens.trackOpenAI(request.params, request.result, request.latency);
}

// Use batch processing for 3-5x better performance:
await costlens.trackBatch([
  { provider: 'openai', model: 'gpt-5.4', tokens: 150, latency: 1200 },
  { provider: 'anthropic', model: 'claude-3', tokens: 200, latency: 1000 },
  { provider: 'openai', model: 'gpt-5.4-mini', tokens: 100, latency: 800 }
]);

4. Monitor Performance

// Check your savings and performance
const analytics = costlens.getCostAnalytics();
console.log('Cache Hit Rate:', analytics.cacheHitRate * 100 + '%');
console.log('Total Savings: $' + analytics.totalSavings);
console.log('Average Latency:', analytics.averageLatency + 'ms');
console.log('Error Rate:', analytics.errorRate * 100 + '%');

// Calculate potential savings
const savings = await costlens.calculateSavings('gpt-5.4', messages);
console.log('Potential Savings: ' + savings.savingsPercentage + '%');
console.log('Recommended Model: ' + savings.recommendedModel);

Performance Benefits

Batch Processing

3-5x faster than individual requests
Reduced HTTP overhead via batching
90% more reliable with fewer failure points
Automatic batching with queue management

Smart Caching

90% faster authentication with 5-minute cache
Savings on repeated requests
Automatic cache management with LRU eviction
Zero configuration required

Circuit Breaker

99.9% uptime with automatic failure protection
Automatic recovery when API is back
Silent degradation when tracking fails
Zero configuration required

Smart Routing

Automatic task detection (coding, writing, analysis)
Optimal model selection based on task type
20-40% cost savings on typical workloads
Custom routing policies for advanced users

Real-World Example

Here's how to optimize a customer support system with batch processing:

// Before: Slow and expensive
const responses = [];
for (const query of customerQueries) {
  const response = await openai.chat.completions.create({
    model: 'gpt-5.4', // Always expensive
    messages: [{ role: 'user', content: query.question }]
  });
  responses.push(response);
}
// Result: 10 requests = 10 API calls = $0.50 = 5 seconds

// After: Fast and cheap with optimizations
const costlens = new CostLens({
  apiKey: 'your-api-key',
  smartRouting: true,    // Auto-route to cheaper models
  enableCache: true,     // Cache repeated requests
  autoOptimize: true,    // Optimize prompts
});

const tracked = costlens.wrapOpenAI(openai);
const responses = [];

// Process in batches for 3-5x better performance
const batchSize = 5;
for (let i = 0; i < customerQueries.length; i += batchSize) {
  const batch = customerQueries.slice(i, i + batchSize);
  
  // Process batch
  const batchPromises = batch.map(async (query) => {
    const response = await tracked.chat.completions.create({
      model: 'gpt-5.4', // Automatically routes to GPT-5.4-mini-turbo for simple tasks
      messages: [{ role: 'user', content: query.question }]
    });
    return response;
  });
  
  const batchResults = await Promise.all(batchPromises);
  responses.push(...batchResults);
}

// Result: 10 requests = 2 batches = $0.10 = 1 second (3-5x faster, significantly cheaper)

// Check your savings
const analytics = costlens.getCostAnalytics();
console.log(`Saved $${analytics.totalSavings} with ${analytics.cacheHitRate * 100}% cache hit rate`);

Best Practices

Do This

Use batch processing for multiple requests (5-20 requests per batch)
Enable all optimization features for maximum performance
Monitor your savings and performance regularly
Handle errors gracefully with fallback strategies
Use appropriate batch sizes for your use case

❌ Avoid This

Making individual requests when you can batch
Disabling optimization features unnecessarily
Ignoring error handling and monitoring
Using batch sizes that are too large or too small
Not testing your routing policies thoroughly

Next Steps

SDK Reference

Complete SDK methods, types, and configuration options

View SDK Documentation →

API Reference

Direct API integration with batch processing endpoint

View API Documentation →

← GitHub Integration

Budget & Alerts →