Skip to main content

Overview

The LiteLLM proxy uses a YAML configuration file to define models, routing, authentication, and other settings.

File Location

Default: config.yaml Start proxy with config:
litellm --config config.yaml

Complete Schema

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-endpoint.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-01"
    model_info:
      mode: chat
      supports_function_calling: true
      supports_vision: true

litellm_settings:
  success_callback: ["langfuse", "lunary"]
  failure_callback: ["sentry"]
  set_verbose: true
  drop_params: true
  max_parallel_requests: 100
  request_timeout: 600
  num_retries: 3
  fallbacks:
    - gpt-4: ["gpt-3.5-turbo", "claude-2"]
  context_window_fallbacks:
    - gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]

general_settings:
  master_key: sk-1234
  database_url: postgresql://...
  store_model_in_db: true
  allowed_routes: ["chat/completions", "embeddings"]
  key_management_settings:
    default_key_duration: 30d
    max_key_duration: 365d

router_settings:
  routing_strategy: latency-based-routing
  routing_strategy_args:
    ttl: 60
  model_group_alias:
    gpt-4: production-gpt-4
  redis_host: localhost
  redis_port: 6379
  redis_password: os.environ/REDIS_PASSWORD
  num_retries: 3
  timeout: 30
  allowed_fails: 3
  cooldown_time: 60

Configuration Sections

model_list

Define model deployments for the proxy.
model_name
string
required
User-facing model name. Multiple deployments can share the same model_name for load balancing.
litellm_params
object
required
Parameters passed to litellm.completion().
model_info
object
Metadata about the model.
tpm
number
Tokens per minute limit for this deployment.
rpm
number
Requests per minute limit for this deployment.

Example

model_list:
  # OpenAI GPT-4
  - model_name: gpt-4
    litellm_params:
      model: gpt-4
      api_key: os.environ/OPENAI_API_KEY
    tpm: 100000
    rpm: 1000
  
  # Azure GPT-4
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-endpoint.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-01"
    tpm: 200000
    rpm: 2000
  
  # Claude 2
  - model_name: claude-2
    litellm_params:
      model: claude-2
      api_key: os.environ/ANTHROPIC_API_KEY
    tpm: 100000
    rpm: 1000
  
  # Bedrock Claude
  - model_name: claude-bedrock
    litellm_params:
      model: bedrock/anthropic.claude-v2
      aws_region_name: us-east-1
  
  # Embedding model
  - model_name: text-embedding-3-small
    litellm_params:
      model: text-embedding-3-small
      api_key: os.environ/OPENAI_API_KEY

litellm_settings

Global LiteLLM configuration.
success_callback
array
Callbacks to run on successful requests.Supported: langfuse, lunary, helicone, supabase, datadog, prometheus, custom
failure_callback
array
Callbacks to run on failed requests.Supported: sentry, slack, webhook, custom
set_verbose
boolean
default:"false"
Enable verbose logging.
drop_params
boolean
default:"false"
Drop unsupported parameters instead of erroring.
max_parallel_requests
number
Maximum parallel requests.
request_timeout
number
default:"600"
Default request timeout in seconds.
num_retries
number
default:"0"
Number of retries on failure.
fallbacks
array
Fallback model configurations.
fallbacks:
  - gpt-4: ["gpt-3.5-turbo", "claude-2"]
  - claude-2: ["gpt-3.5-turbo"]
context_window_fallbacks
array
Fallbacks for context window exceeded errors.
context_window_fallbacks:
  - gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
  - gpt-4: ["gpt-4-32k"]
cache
boolean
default:"false"
Enable caching.
cache_params
object
Caching configuration.
cache_params:
  type: redis
  host: localhost
  port: 6379
  ttl: 3600

Example

litellm_settings:
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["sentry", "slack"]
  set_verbose: true
  drop_params: true
  request_timeout: 300
  num_retries: 3
  fallbacks:
    - gpt-4: ["gpt-3.5-turbo"]
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600

general_settings

Proxy server configuration.
master_key
string
Admin master key for proxy management.
master_key: sk-1234
# or from environment
master_key: os.environ/LITELLM_MASTER_KEY
database_url
string
PostgreSQL connection string for key/user/team storage.
database_url: postgresql://user:pass@localhost:5432/litellm
store_model_in_db
boolean
default:"false"
Store model configurations in database.
allowed_routes
array
Restrict which endpoints are enabled.
allowed_routes:
  - "chat/completions"
  - "embeddings"
  - "key/generate"
ui_access_mode
string
default:"admin"
UI access control.Options: "admin", "all"
max_budget
number
Global budget limit.
budget_duration
string
Budget reset period: "1d", "30d", etc.

Example

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  allowed_routes:
    - "chat/completions"
    - "embeddings"
    - "key/generate"
    - "key/list"
    - "team/new"
  ui_access_mode: admin
  max_budget: 10000.0
  budget_duration: 30d

router_settings

Router configuration for load balancing.
routing_strategy
string
default:"simple-shuffle"
Load balancing strategy.Options:
  • simple-shuffle: Random selection
  • least-busy: Fewest ongoing requests
  • usage-based-routing: Based on TPM/RPM
  • latency-based-routing: Lowest latency
  • cost-based-routing: Lowest cost
routing_strategy_args
object
Strategy-specific arguments.
routing_strategy_args:
  ttl: 60  # For latency-based routing
model_group_alias
object
Model aliases.
model_group_alias:
  gpt-4: production-gpt-4
  claude: production-claude
redis_host
string
Redis host for caching.
redis_port
number
default:"6379"
Redis port.
redis_password
string
Redis password.
num_retries
number
default:"0"
Router-level retries.
timeout
number
default:"600"
Router timeout.
allowed_fails
number
default:"3"
Failures before cooldown.
cooldown_time
number
default:"60"
Cooldown duration in seconds.

Example

router_settings:
  routing_strategy: latency-based-routing
  routing_strategy_args:
    ttl: 60
  model_group_alias:
    gpt-4: prod-gpt-4
  redis_host: localhost
  redis_port: 6379
  redis_password: os.environ/REDIS_PASSWORD
  num_retries: 3
  timeout: 30
  allowed_fails: 5
  cooldown_time: 120

Complete Example

model_list:
  # GPT-4 with load balancing
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://endpoint1.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY_1
      api_version: "2024-02-01"
    tpm: 100000
    rpm: 1000
  
  - model_name: gpt-4
    litellm_params:
      model: gpt-4
      api_key: os.environ/OPENAI_API_KEY
    tpm: 90000
    rpm: 900
  
  # GPT-3.5-Turbo
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
    tpm: 1000000
    rpm: 10000
  
  # Claude
  - model_name: claude-2
    litellm_params:
      model: claude-2
      api_key: os.environ/ANTHROPIC_API_KEY
    tpm: 100000
    rpm: 1000
  
  # Embeddings
  - model_name: text-embedding-3-small
    litellm_params:
      model: text-embedding-3-small
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["sentry"]
  set_verbose: false
  drop_params: true
  request_timeout: 300
  num_retries: 3
  fallbacks:
    - gpt-4: ["gpt-3.5-turbo", "claude-2"]
  context_window_fallbacks:
    - gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true
  allowed_routes:
    - "chat/completions"
    - "embeddings"
    - "key/generate"
    - "key/list"
    - "team/new"
    - "user/new"
  ui_access_mode: admin

router_settings:
  routing_strategy: latency-based-routing
  routing_strategy_args:
    ttl: 60
  redis_host: localhost
  redis_port: 6379
  redis_password: os.environ/REDIS_PASSWORD
  num_retries: 3
  timeout: 30
  allowed_fails: 3
  cooldown_time: 60

Environment Variables

Load values from environment:
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: os.environ/AZURE_API_KEY  # Loads from $AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

Validation

Validate your config:
litellm --config config.yaml --test

Build docs developers (and LLMs) love