Skip to main content

Configuration File Structure

The proxy configuration file (config.yaml) has four main sections:
model_list:          # Model deployments
litellm_settings:    # LiteLLM behavior
router_settings:     # Load balancing
general_settings:    # Proxy server settings

Model List

Define your model deployments:
model_list:
  - model_name: gpt-3.5-turbo     # Name used in API requests
    litellm_params:
      model: openai/gpt-3.5-turbo  # Provider/model format
      api_key: os.environ/OPENAI_API_KEY
      api_base: https://api.openai.com/v1  # Optional
      rpm: 480                      # Requests per minute
      tpm: 100000                   # Tokens per minute
      timeout: 300                  # Request timeout (seconds)
      stream_timeout: 60            # Streaming timeout
    model_info:
      id: "deployment-1"            # Unique deployment ID
      mode: chat                    # chat, completion, or embedding
      base_model: gpt-3.5-turbo

Provider Examples

- model_name: gpt-4
  litellm_params:
    model: openai/gpt-4
    api_key: os.environ/OPENAI_API_KEY
    organization: os.environ/OPENAI_ORG_ID  # Optional

Environment Variables

Load API keys from environment:
api_key: os.environ/OPENAI_API_KEY
Or directly (not recommended for production):
api_key: sk-...
Never commit API keys to version control. Always use environment variables.

Rate Limits

Set per-deployment rate limits:
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
      rpm: 480           # 480 requests per minute
      tpm: 100000        # 100k tokens per minute

Timeouts

litellm_params:
  timeout: 300         # Total request timeout (seconds)
  stream_timeout: 60   # Streaming chunk timeout

LiteLLM Settings

Configure LiteLLM behavior:
litellm_settings:
  # Retry configuration
  num_retries: 3
  request_timeout: 600
  
  # Parameter handling
  drop_params: true              # Drop unsupported params
  
  # Callbacks
  success_callback: ["prometheus", "langfuse"]
  failure_callback: ["slack"]
  
  # Telemetry
  telemetry: false               # Disable usage telemetry
  
  # Caching
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
  
  # Fallbacks
  context_window_fallbacks:
    - gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
    - gpt-4: ["claude-3-opus"]
  
  # Team settings
  default_team_settings:
    - team_id: team-1
      success_callback: ["langfuse"]
      langfuse_public_key: os.environ/LANGFUSE_KEY
      langfuse_secret: os.environ/LANGFUSE_SECRET

Callbacks

Supported callback integrations:
  • prometheus - Metrics export
  • langfuse - Observability
  • lunary - Monitoring
  • helicone - Analytics
  • slack - Alerting
  • webhook - Custom webhooks
  • s3 - Log to S3
litellm_settings:
  success_callback: ["prometheus", "langfuse"]
  failure_callback: ["slack"]
  
  # Callback-specific settings
  langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY
  langfuse_secret: os.environ/LANGFUSE_SECRET_KEY
  
  slack_webhook_url: os.environ/SLACK_WEBHOOK_URL

Caching

litellm_settings:
  cache: true
  cache_params:
    type: local

Budget Configuration

litellm_settings:
  max_budget: 100              # Global budget in USD
  budget_duration: 30d         # Budget period (30d, 1h, etc.)

Router Settings

Configure load balancing and routing:
router_settings:
  # Routing strategy
  routing_strategy: usage-based-routing-v2
  
  # Redis for shared state
  redis_host: os.environ/REDIS_HOST
  redis_password: os.environ/REDIS_PASSWORD
  redis_port: 6379
  
  # Health checks
  enable_pre_call_checks: true
  
  # Model aliases
  model_group_alias:
    gpt-4-latest: "gpt-4"
    claude-latest: "claude-3-opus"

Routing Strategies

router_settings:
  routing_strategy: simple-shuffle

General Settings

Proxy server configuration:
general_settings:
  # Authentication
  master_key: sk-1234                    # Admin API key
  
  # Database
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true                # Store models in DB
  database_connection_pool_limit: 10
  
  # Budget
  proxy_budget_rescheduler_min_time: 60
  proxy_budget_rescheduler_max_time: 64
  proxy_batch_write_at: 1
  
  # Health checks
  background_health_checks: true
  use_shared_health_check: true
  health_check_interval: 30

Pass-Through Endpoints

Define custom pass-through endpoints:
general_settings:
  pass_through_endpoints:
    - path: "/v1/rerank"
      target: "https://api.cohere.com/v1/rerank"
      headers:
        content-type: application/json
        accept: application/json
      forward_headers: true

Master Key

The master key provides admin access:
general_settings:
  master_key: sk-1234
Or from environment:
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
The master key grants full access to the proxy. Keep it secure and rotate regularly.

Advanced Configuration

Fine-Tuning Settings

For /fine_tuning/jobs endpoints:
finetune_settings:
  - custom_llm_provider: azure
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-03-15-preview"
  
  - custom_llm_provider: openai
    api_key: os.environ/OPENAI_API_KEY

Files Settings

For /files endpoints:
files_settings:
  - custom_llm_provider: azure
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-03-15-preview"
  
  - custom_llm_provider: openai
    api_key: os.environ/OPENAI_API_KEY

Wildcard Routing

Route any model name to a provider:
model_list:
  # OpenAI wildcard
  - model_name: "*"
    litellm_params:
      model: openai/*
      api_key: os.environ/OPENAI_API_KEY
  
  # Provider-specific wildcards
  - model_name: "anthropic/*"
    litellm_params:
      model: anthropic/*
      api_key: os.environ/ANTHROPIC_API_KEY
  
  - model_name: "bedrock/*"
    litellm_params:
      model: bedrock/*

Multiple Deployments

Load balance across multiple deployments:
model_list:
  # Deployment 1
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_KEY_1
      rpm: 480
    model_info:
      id: openai-1
  
  # Deployment 2
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_KEY_2
      rpm: 480
    model_info:
      id: openai-2
  
  # Azure fallback
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: "2024-02-15-preview"
    model_info:
      id: azure-1

router_settings:
  routing_strategy: usage-based-routing-v2
  enable_pre_call_checks: true

litellm_settings:
  num_retries: 3

Configuration Validation

Validate your configuration:
# Test configuration
litellm --config config.yaml --test

# Start with verbose logging
litellm --config config.yaml --detailed_debug

Environment Variables

Key environment variables:
# Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=...
AZURE_API_BASE=https://...

# Database
DATABASE_URL=postgresql://user:pass@host:port/db

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=...

# LiteLLM
LITELLM_MASTER_KEY=sk-1234
STORE_MODEL_IN_DB=True

# Optional
LITELLM_LOG=DEBUG
LITELLM_PORT=4000

Best Practices

  1. Security
    • Use environment variables for all secrets
    • Rotate master keys regularly
    • Use strong, unique passwords
  2. Reliability
    • Configure multiple deployments for critical models
    • Enable health checks
    • Set appropriate timeouts
  3. Performance
    • Use Redis for caching and shared state
    • Enable connection pooling
    • Configure rate limits
  4. Monitoring
    • Enable Prometheus metrics
    • Configure logging callbacks
    • Set up alerts for failures

Next Steps

Virtual Keys

Learn about API key management

Budget Alerts

Set up spending alerts

Docker Deployment

Deploy in production

Quick Start

Get started guide

Build docs developers (and LLMs) love