Configuration File Structure
The proxy configuration file (config.yaml) has four main sections:
model_list: # Model deployments
litellm_settings: # LiteLLM behavior
router_settings: # Load balancing
general_settings: # Proxy server settings
Model List
Define your model deployments:
model_list:
- model_name: gpt-3.5-turbo # Name used in API requests
litellm_params:
model: openai/gpt-3.5-turbo # Provider/model format
api_key: os.environ/OPENAI_API_KEY
api_base: https://api.openai.com/v1 # Optional
rpm: 480 # Requests per minute
tpm: 100000 # Tokens per minute
timeout: 300 # Request timeout (seconds)
stream_timeout: 60 # Streaming timeout
model_info:
id: "deployment-1" # Unique deployment ID
mode: chat # chat, completion, or embedding
base_model: gpt-3.5-turbo
Provider Examples
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
organization: os.environ/OPENAI_ORG_ID # Optional
Environment Variables
Load API keys from environment:
api_key: os.environ/OPENAI_API_KEY
Or directly (not recommended for production):
Never commit API keys to version control. Always use environment variables.
Rate Limits
Set per-deployment rate limits:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
rpm: 480 # 480 requests per minute
tpm: 100000 # 100k tokens per minute
Timeouts
litellm_params:
timeout: 300 # Total request timeout (seconds)
stream_timeout: 60 # Streaming chunk timeout
LiteLLM Settings
Configure LiteLLM behavior:
litellm_settings:
# Retry configuration
num_retries: 3
request_timeout: 600
# Parameter handling
drop_params: true # Drop unsupported params
# Callbacks
success_callback: ["prometheus", "langfuse"]
failure_callback: ["slack"]
# Telemetry
telemetry: false # Disable usage telemetry
# Caching
cache: true
cache_params:
type: redis
host: localhost
port: 6379
# Fallbacks
context_window_fallbacks:
- gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
- gpt-4: ["claude-3-opus"]
# Team settings
default_team_settings:
- team_id: team-1
success_callback: ["langfuse"]
langfuse_public_key: os.environ/LANGFUSE_KEY
langfuse_secret: os.environ/LANGFUSE_SECRET
Callbacks
Supported callback integrations:
prometheus - Metrics export
langfuse - Observability
lunary - Monitoring
helicone - Analytics
slack - Alerting
webhook - Custom webhooks
s3 - Log to S3
litellm_settings:
success_callback: ["prometheus", "langfuse"]
failure_callback: ["slack"]
# Callback-specific settings
langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY
langfuse_secret: os.environ/LANGFUSE_SECRET_KEY
slack_webhook_url: os.environ/SLACK_WEBHOOK_URL
Caching
litellm_settings:
cache: true
cache_params:
type: local
Budget Configuration
litellm_settings:
max_budget: 100 # Global budget in USD
budget_duration: 30d # Budget period (30d, 1h, etc.)
Router Settings
Configure load balancing and routing:
router_settings:
# Routing strategy
routing_strategy: usage-based-routing-v2
# Redis for shared state
redis_host: os.environ/REDIS_HOST
redis_password: os.environ/REDIS_PASSWORD
redis_port: 6379
# Health checks
enable_pre_call_checks: true
# Model aliases
model_group_alias:
gpt-4-latest: "gpt-4"
claude-latest: "claude-3-opus"
Routing Strategies
router_settings:
routing_strategy: simple-shuffle
General Settings
Proxy server configuration:
general_settings:
# Authentication
master_key: sk-1234 # Admin API key
# Database
database_url: os.environ/DATABASE_URL
store_model_in_db: true # Store models in DB
database_connection_pool_limit: 10
# Budget
proxy_budget_rescheduler_min_time: 60
proxy_budget_rescheduler_max_time: 64
proxy_batch_write_at: 1
# Health checks
background_health_checks: true
use_shared_health_check: true
health_check_interval: 30
Pass-Through Endpoints
Define custom pass-through endpoints:
general_settings:
pass_through_endpoints:
- path: "/v1/rerank"
target: "https://api.cohere.com/v1/rerank"
headers:
content-type: application/json
accept: application/json
forward_headers: true
Master Key
The master key provides admin access:
general_settings:
master_key: sk-1234
Or from environment:
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
The master key grants full access to the proxy. Keep it secure and rotate regularly.
Advanced Configuration
Fine-Tuning Settings
For /fine_tuning/jobs endpoints:
finetune_settings:
- custom_llm_provider: azure
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-03-15-preview"
- custom_llm_provider: openai
api_key: os.environ/OPENAI_API_KEY
Files Settings
For /files endpoints:
files_settings:
- custom_llm_provider: azure
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-03-15-preview"
- custom_llm_provider: openai
api_key: os.environ/OPENAI_API_KEY
Wildcard Routing
Route any model name to a provider:
model_list:
# OpenAI wildcard
- model_name: "*"
litellm_params:
model: openai/*
api_key: os.environ/OPENAI_API_KEY
# Provider-specific wildcards
- model_name: "anthropic/*"
litellm_params:
model: anthropic/*
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: "bedrock/*"
litellm_params:
model: bedrock/*
Multiple Deployments
Load balance across multiple deployments:
model_list:
# Deployment 1
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_KEY_1
rpm: 480
model_info:
id: openai-1
# Deployment 2
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_KEY_2
rpm: 480
model_info:
id: openai-2
# Azure fallback
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
api_version: "2024-02-15-preview"
model_info:
id: azure-1
router_settings:
routing_strategy: usage-based-routing-v2
enable_pre_call_checks: true
litellm_settings:
num_retries: 3
Configuration Validation
Validate your configuration:
# Test configuration
litellm --config config.yaml --test
# Start with verbose logging
litellm --config config.yaml --detailed_debug
Environment Variables
Key environment variables:
# Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=...
AZURE_API_BASE=https://...
# Database
DATABASE_URL=postgresql://user:pass@host:port/db
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=...
# LiteLLM
LITELLM_MASTER_KEY=sk-1234
STORE_MODEL_IN_DB=True
# Optional
LITELLM_LOG=DEBUG
LITELLM_PORT=4000
Best Practices
-
Security
- Use environment variables for all secrets
- Rotate master keys regularly
- Use strong, unique passwords
-
Reliability
- Configure multiple deployments for critical models
- Enable health checks
- Set appropriate timeouts
-
Performance
- Use Redis for caching and shared state
- Enable connection pooling
- Configure rate limits
-
Monitoring
- Enable Prometheus metrics
- Configure logging callbacks
- Set up alerts for failures
Next Steps
Virtual Keys
Learn about API key management
Budget Alerts
Set up spending alerts
Docker Deployment
Deploy in production
Quick Start
Get started guide