Getting Started

IntroductionQuick Start

Integrations

OpenAIAnthropicMCP ServerSlack

Features

Smart RoutingPrompt ClassifierTeam ManagementBudget & AlertsCaching & PerformanceAnalytics & Reports

Reference

SDKREST APIErrors
Docs/Features/Caching & Performance

Advanced Features

Response caching, middleware hooks, retries, and advanced configuration for production deployments.

Smart Caching

Reduce API costs by caching identical requests:

const costlens = new CostLens({ 
  apiKey: process.env.COSTLENS_API_KEY,
  enableCache: true  // Enable caching globally
});

const tracked = costlens.wrapOpenAI(openai);

// Cache this specific request for 1 hour
const result = await tracked.chat.completions.create(
  { model: 'gpt-5.4', messages: [...] },
  { cacheTTL: 3600000 }  // 1 hour in milliseconds
);

// Second identical call returns cached result instantly
const cached = await tracked.chat.completions.create(
  { model: 'gpt-5.4', messages: [...] },
  { cacheTTL: 3600000 }
);

// Clear cache when needed
costlens.clearCache();

Use cases: FAQ responses, static content generation, repeated queries

Auto-Retry with Backoff

Automatically retry failed requests with exponential backoff:

const costlens = new CostLens({ 
  apiKey: process.env.COSTLENS_API_KEY,
  maxRetries: 3  // Retry up to 3 times
});

// Automatically retries on 5xx errors with backoff: 1s, 2s, 4s
const tracked = costlens.wrapOpenAI(openai);
const result = await tracked.chat.completions.create({
  model: 'gpt-5.4',
  messages: [...]
});
// Handles transient failures automatically

Middleware System

Add custom logic before/after API calls:

const costlens = new CostLens({
  apiKey: process.env.COSTLENS_API_KEY,
  middleware: [
    {
      // Run before API call
      before: async (params) => {
        console.log('Making API call with model:', params.model);
        // Modify params if needed
        params.temperature = params.temperature || 0.7;
        return params;
      },
      
      // Run after successful API call
      after: async (result) => {
        console.log('Tokens used:', result.usage?.total_tokens);
        // Transform result if needed
        return result;
      },
      
      // Run on error
      onError: async (error) => {
        console.error('API call failed:', error.message);
        // Send to error tracking service
      }
    }
  ]
});

Middleware Use Cases

  • Logging: Track all API calls to your logging service
  • Validation: Ensure params meet your requirements
  • Transformation: Modify requests/responses
  • Monitoring: Send metrics to DataDog/New Relic
  • Rate limiting: Implement custom rate limits
  • A/B testing: Route requests to different models

Streaming Support

Track streaming responses automatically:

const tracked = costlens.wrapOpenAI(openai);

const stream = await tracked.chat.completions.stream({
  model: 'gpt-5.4',
  messages: [{ role: 'user', content: 'Tell me a long story' }]
});

// Stream chunks to user
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

// Automatically tracked after stream completes
// Includes full content and estimated token count

Batch Tracking

Efficiently track multiple calls at once:

// Track multiple calls in one request
await costlens.trackBatch([
  { provider: 'openai', model: 'gpt-5.4', tokens: 100, latency: 500 },
  { provider: 'openai', model: 'gpt-5.4-mini', tokens: 50, latency: 200 },
  { provider: 'anthropic', model: 'claude-sonnet-4.6', tokens: 150, latency: 600 }
]);

// Useful for background jobs or bulk processing

Tagging & Organization

Tag prompts to analyze by use case:

const tracked = costlens.wrapOpenAI(openai);

// Tag with promptId for analytics
const result = await tracked.chat.completions.create(
  { model: 'gpt-5.4', messages: [...] },
  { promptId: 'customer-support-v2' }
);

// View analytics grouped by promptId in dashboard
// Compare performance across different prompt versions

Configuration Reference

interface CostLensConfig {
  apiKey: string;           // Required: Your API key
  baseUrl?: string;         // Optional: Custom endpoint
  enableCache?: boolean;    // Enable caching (default: true)
  maxRetries?: number;      // Optional: Max retry attempts (default: 3)
  middleware?: Middleware[]; // Optional: Custom middleware
}

interface Middleware {
  before?: (params: any) => Promise<any>;
  after?: (result: any) => Promise<any>;
  onError?: (error: Error) => Promise<void>;
}

Best Practices

  1. Use wrappers - Simplest and most reliable approach
  2. Enable caching - For repeated queries (FAQs, static content)
  3. Set maxRetries - Improve reliability for production
  4. Use middleware - For logging, monitoring, validation
  5. Tag prompts - Analyze performance by use case
  6. Monitor costs - Set budget alerts in dashboard

Next Steps

OpenAI Integration

Anthropic Integration

SDK Reference

Previous

← Budget & Alerts

Next

Analytics & Reports →