Skip to main content

Overview

LiteLLM automatically calculates and tracks costs for all supported LLM providers. Track spending across models, users, teams, and API keys to manage budgets and optimize usage.

Automatic Cost Calculation

Costs are calculated automatically for every request:
from litellm import completion

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Cost information in response
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total cost: ${response._hidden_params['response_cost']}")

Supported Cost Metrics

LiteLLM tracks costs for:
  • Completion/Chat - Input and output tokens
  • Embeddings - Per token or per request
  • Image Generation - Per image, resolution, quality
  • Audio (Speech) - Per character or per second
  • Audio (Transcription) - Per second
  • Fine-tuning - Training tokens
  • Realtime API - Session duration, audio input/output

Provider Support

Cost tracking for 100+ providers:
  • OpenAI (GPT-4, GPT-3.5, etc.)
  • Anthropic (Claude)
  • Google (Gemini, Vertex AI)
  • Azure OpenAI
  • AWS Bedrock
  • Cohere
  • Replicate
  • Together AI
  • And many more…

Accessing Cost Information

Response Object

response = completion(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Standard usage information
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Completion tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")

# Cost information
cost = response._hidden_params.get("response_cost", 0)
print(f"Request cost: ${cost:.6f}")

Streaming Responses

For streaming, cost is available in the final chunk:
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

total_cost = 0
for chunk in response:
    if hasattr(chunk, '_hidden_params'):
        total_cost = chunk._hidden_params.get('response_cost', 0)

print(f"Total streaming cost: ${total_cost:.6f}")

Custom Pricing

Override default pricing for custom deployments:
from litellm import completion

response = completion(
    model="custom-model",
    messages=[{"role": "user", "content": "Hello"}],
    input_cost_per_token=0.00001,  # $0.00001 per input token
    output_cost_per_token=0.00003  # $0.00003 per output token
)

Custom Pricing with Router

from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "custom-gpt",
            "litellm_params": {
                "model": "openai/custom-deployment",
                "api_key": "sk-...",
                "input_cost_per_token": 0.00002,
                "output_cost_per_token": 0.00004
            }
        }
    ]
)

Budget Management

Set Budget Limits

Prevent overspending with budget limits:
import litellm
from litellm import BudgetManager

# Initialize budget manager
budget = BudgetManager(
    project_name="my-project",
    max_budget=100.00,  # $100 limit
    budget_duration="monthly"  # monthly, daily, or total
)

litellm.budget_manager = budget

# Requests will fail if budget is exceeded
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

User-Level Budgets

Set budgets per user or API key:
from litellm import Router

router = Router(
    model_list=[...],
    provider_budget_config={
        "openai": {"budget_limit": 100.0},  # $100/day for OpenAI
        "anthropic": {"budget_limit": 50.0},  # $50/day for Anthropic
        "google": {"budget_limit": 75.0}  # $75/day for Google
    }
)

Cost Logging

Custom Cost Logger

from litellm.integrations import CustomLogger
import litellm

class CostLogger(CustomLogger):
    def __init__(self):
        self.total_cost = 0
        super().__init__()
    
    def log_success_event(self, kwargs, response_obj, start_time, end_time):
        cost = kwargs.get("response_cost", 0)
        self.total_cost += cost
        
        print(f"Request cost: ${cost:.6f}")
        print(f"Total cost: ${self.total_cost:.6f}")
        print(f"Model: {kwargs.get('model')}")
        print(f"Tokens: {response_obj.usage.total_tokens}")

cost_logger = CostLogger()
litellm.callbacks = [cost_logger]

# Make requests
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

print(f"Total accumulated cost: ${cost_logger.total_cost:.6f}")

Database Cost Tracking

from litellm.integrations import CustomLogger
import sqlite3
from datetime import datetime

class DatabaseCostLogger(CustomLogger):
    def __init__(self, db_path="costs.db"):
        self.conn = sqlite3.connect(db_path)
        self.create_table()
        super().__init__()
    
    def create_table(self):
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS costs (
                id INTEGER PRIMARY KEY,
                timestamp TEXT,
                model TEXT,
                prompt_tokens INTEGER,
                completion_tokens INTEGER,
                total_tokens INTEGER,
                cost REAL,
                user_id TEXT
            )
        """)
        self.conn.commit()
    
    def log_success_event(self, kwargs, response_obj, start_time, end_time):
        usage = response_obj.usage
        cost = kwargs.get("response_cost", 0)
        user_id = kwargs.get("metadata", {}).get("user_id")
        
        self.conn.execute("""
            INSERT INTO costs (timestamp, model, prompt_tokens, 
                             completion_tokens, total_tokens, cost, user_id)
            VALUES (?, ?, ?, ?, ?, ?, ?)
        """, (
            datetime.now().isoformat(),
            kwargs.get("model"),
            usage.prompt_tokens,
            usage.completion_tokens,
            usage.total_tokens,
            cost,
            user_id
        ))
        self.conn.commit()

litellm.callbacks = [DatabaseCostLogger()]

Cost Analytics

Query Costs by Model

import sqlite3

conn = sqlite3.connect("costs.db")

# Total cost by model
result = conn.execute("""
    SELECT model, SUM(cost) as total_cost, COUNT(*) as request_count
    FROM costs
    GROUP BY model
    ORDER BY total_cost DESC
""").fetchall()

for model, cost, count in result:
    print(f"{model}: ${cost:.2f} ({count} requests)")

Query Costs by User

# Total cost by user
result = conn.execute("""
    SELECT user_id, SUM(cost) as total_cost
    FROM costs
    WHERE user_id IS NOT NULL
    GROUP BY user_id
    ORDER BY total_cost DESC
    LIMIT 10
""").fetchall()

for user_id, cost in result:
    print(f"User {user_id}: ${cost:.2f}")

Time-Based Analysis

# Daily costs
result = conn.execute("""
    SELECT DATE(timestamp) as date, SUM(cost) as daily_cost
    FROM costs
    GROUP BY DATE(timestamp)
    ORDER BY date DESC
    LIMIT 30
""").fetchall()

for date, cost in result:
    print(f"{date}: ${cost:.2f}")

Cost Optimization

Model Cost Comparison

from litellm import model_cost

# Get costs for different models
models = ["gpt-4", "gpt-3.5-turbo", "claude-3-opus", "claude-3-sonnet"]

for model in models:
    try:
        cost_info = model_cost.get(model)
        if cost_info:
            input_cost = cost_info.get("input_cost_per_token", 0)
            output_cost = cost_info.get("output_cost_per_token", 0)
            print(f"{model}:")
            print(f"  Input: ${input_cost * 1000000:.2f}/1M tokens")
            print(f"  Output: ${output_cost * 1000000:.2f}/1M tokens")
    except:
        print(f"{model}: Cost info not available")

Prompt Optimization

Reduce costs by optimizing prompts:
from litellm import token_counter

prompt = "Your long prompt here..."

# Count tokens before sending
token_count = token_counter(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
)

print(f"Prompt tokens: {token_count}")

# Estimate cost
input_cost_per_token = 0.00003  # GPT-4
estimated_cost = token_count * input_cost_per_token
print(f"Estimated input cost: ${estimated_cost:.6f}")

Choose Cost-Effective Models

from litellm import Router

router = Router(
    model_list=[
        # Primary: Cheap model
        {
            "model_name": "smart",
            "litellm_params": {"model": "gpt-3.5-turbo"}
        },
        # Fallback: Expensive model
        {
            "model_name": "smart",
            "litellm_params": {"model": "gpt-4"}
        }
    ],
    routing_strategy="cost-based-routing"  # Prefer cheaper models
)

Cost Alerting

Threshold-Based Alerts

from litellm.integrations import CustomLogger
import litellm

class CostAlertLogger(CustomLogger):
    def __init__(self, daily_threshold=10.0):
        self.daily_threshold = daily_threshold
        self.daily_cost = 0
        self.alert_sent = False
        super().__init__()
    
    def log_success_event(self, kwargs, response_obj, start_time, end_time):
        cost = kwargs.get("response_cost", 0)
        self.daily_cost += cost
        
        if self.daily_cost > self.daily_threshold and not self.alert_sent:
            self.send_alert(self.daily_cost)
            self.alert_sent = True
    
    def send_alert(self, cost):
        print(f"⚠️ ALERT: Daily cost ${cost:.2f} exceeds threshold ${self.daily_threshold:.2f}")
        # Send email, Slack notification, etc.

litellm.callbacks = [CostAlertLogger(daily_threshold=10.0)]

Integration with Observability Platforms

Langfuse Integration

import litellm

litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

# Set Langfuse credentials
import os
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"

# Costs automatically tracked in Langfuse
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

DataDog Integration

import litellm

litellm.success_callback = ["datadog"]

# Set DataDog credentials
import os
os.environ["DD_API_KEY"] = "..."
os.environ["DD_SITE"] = "datadoghq.com"

# Costs sent as metrics to DataDog
response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Best Practices

Cost Management Tips

  1. Monitor daily - Track costs in real-time
  2. Set budgets - Use budget limits to prevent overruns
  3. Optimize prompts - Reduce token usage
  4. Cache responses - Avoid redundant API calls
  5. Use cheaper models - Balance cost vs. quality
  6. Track by user - Identify high-cost users
  7. Alert on thresholds - Get notified of unusual spending
  8. Analyze trends - Review cost patterns weekly

Cost Calculation Details

Token-Based Pricing

Most models charge per token:
# Example calculation for GPT-4
input_tokens = 100
output_tokens = 50

input_cost = input_tokens * 0.00003  # $0.03/1K tokens
output_cost = output_tokens * 0.00006  # $0.06/1K tokens
total_cost = input_cost + output_cost

print(f"Total cost: ${total_cost:.6f}")  # $0.006000

Image Generation Pricing

from litellm import image_generation

response = image_generation(
    model="dall-e-3",
    prompt="A sunset over mountains",
    size="1024x1024",
    quality="hd",
    n=1
)

# Cost based on size, quality, and number of images
cost = response._hidden_params.get("response_cost")
print(f"Image generation cost: ${cost}")

Audio Pricing

# Speech (TTS)
from litellm import speech

response = speech(
    model="tts-1",
    input="Hello, world!",
    voice="alloy"
)

# Cost based on character count
cost = response._hidden_params.get("response_cost")

Build docs developers (and LLMs) love