Overview
The LiteLLM proxy uses a YAML configuration file to define models, routing, authentication, and other settings.
File Location
Default: config.yaml
Start proxy with config:
litellm --config config.yaml
Complete Schema
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://your-endpoint.openai.azure.com/
api_key: os.environ/AZURE_API_KEY
api_version: "2024-02-01"
model_info:
mode: chat
supports_function_calling: true
supports_vision: true
litellm_settings:
success_callback: ["langfuse", "lunary"]
failure_callback: ["sentry"]
set_verbose: true
drop_params: true
max_parallel_requests: 100
request_timeout: 600
num_retries: 3
fallbacks:
- gpt-4: ["gpt-3.5-turbo", "claude-2"]
context_window_fallbacks:
- gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
general_settings:
master_key: sk-1234
database_url: postgresql://...
store_model_in_db: true
allowed_routes: ["chat/completions", "embeddings"]
key_management_settings:
default_key_duration: 30d
max_key_duration: 365d
router_settings:
routing_strategy: latency-based-routing
routing_strategy_args:
ttl: 60
model_group_alias:
gpt-4: production-gpt-4
redis_host: localhost
redis_port: 6379
redis_password: os.environ/REDIS_PASSWORD
num_retries: 3
timeout: 30
allowed_fails: 3
cooldown_time: 60
Configuration Sections
model_list
Define model deployments for the proxy.
User-facing model name. Multiple deployments can share the same model_name for load balancing.
Parameters passed to litellm.completion().Show litellm_params fields
Provider-specific model identifier.Examples:
azure/gpt-4
bedrock/anthropic.claude-v2
vertex_ai/gemini-pro
API key. Use os.environ/VAR_NAME to load from environment.
API version (provider-specific).
Request timeout in seconds.
Metadata about the model.
Model mode: "chat", "completion", "embedding", "image_generation"
Cost per input token in USD.
Cost per output token in USD.
Maximum tokens supported.
supports_function_calling
Whether model supports function calling.
Whether model supports vision/image inputs.
Tokens per minute limit for this deployment.
Requests per minute limit for this deployment.
Example
model_list:
# OpenAI GPT-4
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: os.environ/OPENAI_API_KEY
tpm: 100000
rpm: 1000
# Azure GPT-4
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://your-endpoint.openai.azure.com/
api_key: os.environ/AZURE_API_KEY
api_version: "2024-02-01"
tpm: 200000
rpm: 2000
# Claude 2
- model_name: claude-2
litellm_params:
model: claude-2
api_key: os.environ/ANTHROPIC_API_KEY
tpm: 100000
rpm: 1000
# Bedrock Claude
- model_name: claude-bedrock
litellm_params:
model: bedrock/anthropic.claude-v2
aws_region_name: us-east-1
# Embedding model
- model_name: text-embedding-3-small
litellm_params:
model: text-embedding-3-small
api_key: os.environ/OPENAI_API_KEY
litellm_settings
Global LiteLLM configuration.
Callbacks to run on successful requests.Supported: langfuse, lunary, helicone, supabase, datadog, prometheus, custom
Callbacks to run on failed requests.Supported: sentry, slack, webhook, custom
Drop unsupported parameters instead of erroring.
Maximum parallel requests.
Default request timeout in seconds.
Number of retries on failure.
Fallback model configurations.fallbacks:
- gpt-4: ["gpt-3.5-turbo", "claude-2"]
- claude-2: ["gpt-3.5-turbo"]
Fallbacks for context window exceeded errors.context_window_fallbacks:
- gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
- gpt-4: ["gpt-4-32k"]
Caching configuration.cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600
Example
litellm_settings:
success_callback: ["langfuse", "prometheus"]
failure_callback: ["sentry", "slack"]
set_verbose: true
drop_params: true
request_timeout: 300
num_retries: 3
fallbacks:
- gpt-4: ["gpt-3.5-turbo"]
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600
general_settings
Proxy server configuration.
Admin master key for proxy management.master_key: sk-1234
# or from environment
master_key: os.environ/LITELLM_MASTER_KEY
PostgreSQL connection string for key/user/team storage.database_url: postgresql://user:pass@localhost:5432/litellm
Store model configurations in database.
Restrict which endpoints are enabled.allowed_routes:
- "chat/completions"
- "embeddings"
- "key/generate"
UI access control.Options: "admin", "all"
Budget reset period: "1d", "30d", etc.
Example
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
store_model_in_db: true
allowed_routes:
- "chat/completions"
- "embeddings"
- "key/generate"
- "key/list"
- "team/new"
ui_access_mode: admin
max_budget: 10000.0
budget_duration: 30d
router_settings
Router configuration for load balancing.
routing_strategy
string
default:"simple-shuffle"
Load balancing strategy.Options:
simple-shuffle: Random selection
least-busy: Fewest ongoing requests
usage-based-routing: Based on TPM/RPM
latency-based-routing: Lowest latency
cost-based-routing: Lowest cost
Strategy-specific arguments.routing_strategy_args:
ttl: 60 # For latency-based routing
Model aliases.model_group_alias:
gpt-4: production-gpt-4
claude: production-claude
Failures before cooldown.
Cooldown duration in seconds.
Example
router_settings:
routing_strategy: latency-based-routing
routing_strategy_args:
ttl: 60
model_group_alias:
gpt-4: prod-gpt-4
redis_host: localhost
redis_port: 6379
redis_password: os.environ/REDIS_PASSWORD
num_retries: 3
timeout: 30
allowed_fails: 5
cooldown_time: 120
Complete Example
model_list:
# GPT-4 with load balancing
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://endpoint1.openai.azure.com/
api_key: os.environ/AZURE_API_KEY_1
api_version: "2024-02-01"
tpm: 100000
rpm: 1000
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: os.environ/OPENAI_API_KEY
tpm: 90000
rpm: 900
# GPT-3.5-Turbo
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
tpm: 1000000
rpm: 10000
# Claude
- model_name: claude-2
litellm_params:
model: claude-2
api_key: os.environ/ANTHROPIC_API_KEY
tpm: 100000
rpm: 1000
# Embeddings
- model_name: text-embedding-3-small
litellm_params:
model: text-embedding-3-small
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
success_callback: ["langfuse", "prometheus"]
failure_callback: ["sentry"]
set_verbose: false
drop_params: true
request_timeout: 300
num_retries: 3
fallbacks:
- gpt-4: ["gpt-3.5-turbo", "claude-2"]
context_window_fallbacks:
- gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
store_model_in_db: true
allowed_routes:
- "chat/completions"
- "embeddings"
- "key/generate"
- "key/list"
- "team/new"
- "user/new"
ui_access_mode: admin
router_settings:
routing_strategy: latency-based-routing
routing_strategy_args:
ttl: 60
redis_host: localhost
redis_port: 6379
redis_password: os.environ/REDIS_PASSWORD
num_retries: 3
timeout: 30
allowed_fails: 3
cooldown_time: 60
Environment Variables
Load values from environment:
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key: os.environ/AZURE_API_KEY # Loads from $AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
Validation
Validate your config:
litellm --config config.yaml --test