Getting Started

IntroductionLocal ProxySDK Quick Start

AI ROI

Analytics & ReportsTeam ManagementGitHub Integration

Cost Management

Smart RoutingBudget & AlertsPrompt ClassifierCaching & Performance

Integrations

OpenAIAnthropicMCP ServerSlack

Reference

SDKREST APIErrors
Docs/Cost Management/Smart Routing

Performance Optimizations

Reduce costs with smart routing, batch processing, and circuit breaker protection.

New Performance Features

Batch Processing: 3-5x faster with reduced HTTP overhead

Smart Caching: 90% faster authentication with 5-minute cache

Circuit Breaker: 99.9% uptime with automatic failure protection

Smart Routing: Automatic task detection and optimal model selection

Quick Start

Get started with performance optimizations in 5 minutes:

1. Install the SDK

npm install costlens

2. Enable Optimizations

import { CostLens } from 'costlens';

const costlens = new CostLens({
  apiKey: 'your-api-key',
  smartRouting: true,     // Auto-route to cheaper models
  enableCache: true,      // Cache repeated requests
  autoOptimize: true,     // Optimize prompts automatically
  costLimit: 0.10,        // Max $0.10 per request
});

3. Use Batch Processing

// Instead of individual requests:
for (const request of requests) {
  await costlens.trackOpenAI(request.params, request.result, request.latency);
}

// Use batch processing for 3-5x better performance:
await costlens.trackBatch([
  { provider: 'openai', model: 'gpt-5.4', tokens: 150, latency: 1200 },
  { provider: 'anthropic', model: 'claude-3', tokens: 200, latency: 1000 },
  { provider: 'openai', model: 'gpt-5.4-mini', tokens: 100, latency: 800 }
]);

4. Monitor Performance

// Check your savings and performance
const analytics = costlens.getCostAnalytics();
console.log('Cache Hit Rate:', analytics.cacheHitRate * 100 + '%');
console.log('Total Savings: $' + analytics.totalSavings);
console.log('Average Latency:', analytics.averageLatency + 'ms');
console.log('Error Rate:', analytics.errorRate * 100 + '%');

// Calculate potential savings
const savings = await costlens.calculateSavings('gpt-5.4', messages);
console.log('Potential Savings: ' + savings.savingsPercentage + '%');
console.log('Recommended Model: ' + savings.recommendedModel);

Performance Benefits

Batch Processing

  • 3-5x faster than individual requests
  • Reduced HTTP overhead via batching
  • 90% more reliable with fewer failure points
  • Automatic batching with queue management

Smart Caching

  • 90% faster authentication with 5-minute cache
  • Savings on repeated requests
  • Automatic cache management with LRU eviction
  • Zero configuration required

Circuit Breaker

  • 99.9% uptime with automatic failure protection
  • Automatic recovery when API is back
  • Silent degradation when tracking fails
  • Zero configuration required

Smart Routing

  • Automatic task detection (coding, writing, analysis)
  • Optimal model selection based on task type
  • 20-40% cost savings on typical workloads
  • Custom routing policies for advanced users

Real-World Example

Here's how to optimize a customer support system with batch processing:

// Before: Slow and expensive
const responses = [];
for (const query of customerQueries) {
  const response = await openai.chat.completions.create({
    model: 'gpt-5.4', // Always expensive
    messages: [{ role: 'user', content: query.question }]
  });
  responses.push(response);
}
// Result: 10 requests = 10 API calls = $0.50 = 5 seconds

// After: Fast and cheap with optimizations
const costlens = new CostLens({
  apiKey: 'your-api-key',
  smartRouting: true,    // Auto-route to cheaper models
  enableCache: true,     // Cache repeated requests
  autoOptimize: true,    // Optimize prompts
});

const tracked = costlens.wrapOpenAI(openai);
const responses = [];

// Process in batches for 3-5x better performance
const batchSize = 5;
for (let i = 0; i < customerQueries.length; i += batchSize) {
  const batch = customerQueries.slice(i, i + batchSize);
  
  // Process batch
  const batchPromises = batch.map(async (query) => {
    const response = await tracked.chat.completions.create({
      model: 'gpt-5.4', // Automatically routes to GPT-5.4-mini-turbo for simple tasks
      messages: [{ role: 'user', content: query.question }]
    });
    return response;
  });
  
  const batchResults = await Promise.all(batchPromises);
  responses.push(...batchResults);
}

// Result: 10 requests = 2 batches = $0.10 = 1 second (3-5x faster, significantly cheaper)

// Check your savings
const analytics = costlens.getCostAnalytics();
console.log(`Saved $${analytics.totalSavings} with ${analytics.cacheHitRate * 100}% cache hit rate`);

Best Practices

Do This

  • Use batch processing for multiple requests (5-20 requests per batch)
  • Enable all optimization features for maximum performance
  • Monitor your savings and performance regularly
  • Handle errors gracefully with fallback strategies
  • Use appropriate batch sizes for your use case

❌ Avoid This

  • Making individual requests when you can batch
  • Disabling optimization features unnecessarily
  • Ignoring error handling and monitoring
  • Using batch sizes that are too large or too small
  • Not testing your routing policies thoroughly

Next Steps

SDK Reference

Complete SDK methods, types, and configuration options

View SDK Documentation →

API Reference

Direct API integration with batch processing endpoint

View API Documentation →

Previous

← GitHub Integration

Next

Budget & Alerts →