Skip to main content

Authentication and Authorization

Master Key

Always set a strong LITELLM_MASTER_KEY in production. This key has full admin access.
Generate a secure master key:
# Generate cryptographically secure key
openssl rand -hex 32

# Or use Python
python -c "import secrets; print('sk-' + secrets.token_hex(32))"
Set in environment:
LITELLM_MASTER_KEY=sk-a7f9c8e2d1b4f6a8c9e3d2b5f7a9c1e4d3b6f8a2c5e7d9b4f6a8c1e3d5b7f9a2
Or in config:
config.yaml
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

Virtual Keys

Create scoped API keys with rate limits and budgets:
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "key_alias": "frontend-app",
    "models": ["gpt-4o", "claude-sonnet-4"],
    "max_budget": 100.0,
    "budget_duration": "30d",
    "rpm_limit": 100,
    "tpm_limit": 100000,
    "team_id": "team-engineering",
    "metadata": {
      "environment": "production",
      "app": "chat-widget"
    }
  }'
Response:
{
  "key": "sk-litellm-a1b2c3d4...",
  "key_name": "frontend-app",
  "expires": null,
  "models": ["gpt-4o", "claude-sonnet-4"],
  "max_budget": 100.0,
  "budget_duration": "30d"
}
Virtual keys are hashed before storage. The actual key is only shown once during creation.

Team-Based Access Control

Create teams with budgets:
curl -X POST http://localhost:4000/team/new \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "team_alias": "engineering",
    "models": ["gpt-4o", "claude-sonnet-4"],
    "max_budget": 1000.0,
    "budget_duration": "30d",
    "rpm_limit": 500,
    "tpm_limit": 500000
  }'
Assign users to teams:
curl -X POST http://localhost:4000/team/member/add \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team-123",
    "user_id": "[email protected]",
    "role": "member"
  }'

SSO Integration

config.yaml
general_settings:
  ui_access_mode: sso
  sso_providers:
    - provider: google
      client_id: os.environ/GOOGLE_CLIENT_ID
      client_secret: os.environ/GOOGLE_CLIENT_SECRET
      allowed_domains:
        - company.com
        - subsidiary.com
Set environment variables:
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-secret

Secrets Management

Environment Variables

Never commit API keys to Git. Use environment variables or secrets management systems.
Reference in config.yaml:
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY  # ✅ Good
      # api_key: sk-proj-xxx  # ❌ Bad - never hardcode
  
  - model_name: azure-gpt4
    litellm_params:
      model: azure/gpt-4
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL

Kubernetes Secrets

Create secrets:
kubectl create secret generic litellm-secrets \
  --from-literal=LITELLM_MASTER_KEY=sk-... \
  --from-literal=OPENAI_API_KEY=sk-proj-... \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-... \
  --from-literal=DATABASE_URL=postgresql://...
Mount in deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm
spec:
  template:
    spec:
      containers:
        - name: litellm
          envFrom:
            - secretRef:
                name: litellm-secrets
          # Or individual env vars
          env:
            - name: LITELLM_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: LITELLM_MASTER_KEY

AWS Secrets Manager

Store secrets in AWS:
aws secretsmanager create-secret \
  --name litellm/prod/api-keys \
  --secret-string '{
    "OPENAI_API_KEY": "sk-proj-...",
    "ANTHROPIC_API_KEY": "sk-ant-..."
  }'
Retrieve in application:
import boto3
import json
import os

def load_secrets():
    client = boto3.client('secretsmanager', region_name='us-east-1')
    secret = client.get_secret_value(SecretId='litellm/prod/api-keys')
    secrets = json.loads(secret['SecretString'])
    
    for key, value in secrets.items():
        os.environ[key] = value

load_secrets()

HashiCorp Vault

config.yaml
general_settings:
  secret_manager:
    type: vault
    vault_address: https://vault.company.com
    vault_token: os.environ/VAULT_TOKEN
    vault_mount_path: secret/litellm

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: vault/openai/api_key  # Fetches from Vault

Network Security

TLS/SSL Configuration

Enable HTTPS in production:
server {
    listen 443 ssl http2;
    server_name api.yourdomain.com;
    
    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    
    location / {
        proxy_pass http://litellm:4000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Security headers
        add_header X-Content-Type-Options nosniff;
        add_header X-Frame-Options DENY;
        add_header X-XSS-Protection "1; mode=block";
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
    }
}

Firewall Rules

Allow only necessary traffic:
# AWS Security Group
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp \
  --port 443 \
  --cidr 0.0.0.0/0  # HTTPS from anywhere

aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp \
  --port 5432 \
  --source-group sg-litellm  # PostgreSQL only from LiteLLM
GCP Firewall:
gcloud compute firewall-rules create allow-https \
  --allow tcp:443 \
  --source-ranges 0.0.0.0/0 \
  --target-tags litellm

gcloud compute firewall-rules create allow-postgres \
  --allow tcp:5432 \
  --source-tags litellm \
  --target-tags postgres

Network Policies (Kubernetes)

networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: litellm-netpol
spec:
  podSelector:
    matchLabels:
      app: litellm
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 4000
  egress:
    # Allow to database
    - to:
        - podSelector:
            matchLabels:
              app: postgresql
      ports:
        - protocol: TCP
          port: 5432
    # Allow to Redis
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379
    # Allow external API calls (OpenAI, Anthropic, etc.)
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443
    # Allow DNS
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53

Rate Limiting and DDoS Protection

Built-in Rate Limiting

Global rate limits:
config.yaml
general_settings:
  max_parallel_requests: 100  # Concurrent requests
  global_max_parallel_requests: 1000
  
router_settings:
  rpm_limit: 10000  # Requests per minute
  tpm_limit: 1000000  # Tokens per minute
Per-key rate limits:
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rpm_limit": 100,
    "tpm_limit": 100000,
    "max_parallel_requests": 10
  }'
Per-team rate limits:
curl -X POST http://localhost:4000/team/new \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "team_alias": "engineering",
    "rpm_limit": 1000,
    "tpm_limit": 1000000
  }'

NGINX Rate Limiting

http {
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
    limit_conn_zone $binary_remote_addr zone=addr:10m;
    
    server {
        location / {
            limit_req zone=api_limit burst=20 nodelay;
            limit_conn addr 10;
            
            proxy_pass http://litellm:4000;
        }
    }
}

Cloudflare Protection

Use Cloudflare (or similar CDN) for DDoS protection and rate limiting.
Benefits:
  • DDoS mitigation
  • WAF (Web Application Firewall)
  • Rate limiting at edge
  • Bot detection
  • Geographic restrictions

Data Protection

Database Encryption

Enable PostgreSQL SSL:
config.yaml
general_settings:
  database_url: postgresql://user:pass@host:5432/db?sslmode=require
Encryption at rest:
# AWS RDS
aws rds create-db-instance \
  --db-instance-identifier litellm-db \
  --storage-encrypted \
  --kms-key-id arn:aws:kms:us-east-1:123456789:key/xxx

# GCP Cloud SQL
gcloud sql instances create litellm-db \
  --database-version=POSTGRES_16 \
  --tier=db-n1-standard-2 \
  --encryption-key=projects/PROJECT/locations/LOCATION/keyRings/RING/cryptoKeys/KEY

Redacting Sensitive Data

Mask PII in logs:
config.yaml
general_settings:
  redact_user_api_key_info: true  # Don't log API keys
  redact_messages_in_logs: true   # Don't log message content

litellm_settings:
  drop_params: true  # Drop request params from logs
  
router_settings:
  allowed_cache_controls: []  # Disable caching sensitive data

Request/Response Encryption

End-to-end encryption:
from cryptography.fernet import Fernet
import openai

key = Fernet.generate_key()
cipher = Fernet(key)

# Encrypt sensitive data before sending
message = "Confidential information"
encrypted = cipher.encrypt(message.encode())

# Send to LiteLLM
response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": encrypted.decode()}]
)

# Decrypt response
decrypted = cipher.decrypt(response.choices[0].message.content.encode())

Compliance

GDPR Compliance

1

Data Minimization

Only store necessary data:
general_settings:
  store_model_in_db: true
  store_audit_logs: true
  log_retention_days: 90  # Automatic cleanup
2

Right to Deletion

Implement user data deletion:
# Delete user data
curl -X DELETE http://localhost:4000/user/delete \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "[email protected]"}'
This removes:
  • User account
  • API keys
  • Spend logs (or anonymizes)
  • Personal metadata
3

Data Export

Allow users to export their data:
# Export user data
curl http://localhost:4000/user/[email protected] \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  > user_data.json
4

Consent Management

Track user consent:
# Add consent metadata
client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "metadata": {
            "user_consent": "analytics:true,storage:true",
            "consent_date": "2024-01-01"
        }
    }
)

SOC 2 / ISO 27001

Audit logging:
-- View audit trail
SELECT 
  updated_at,
  changed_by,
  action,
  table_name,
  object_id,
  before_value,
  updated_values
FROM "LiteLLM_AuditLog"
ORDER BY updated_at DESC;
Access logs:
general_settings:
  detailed_debug: true  # Detailed access logs
  
litellm_settings:
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["langfuse", "sentry"]

HIPAA Compliance

For HIPAA compliance, ensure BAA (Business Associate Agreement) with all LLM providers.
Requirements:
  • Enable encryption at rest and in transit
  • Implement access controls (RBAC)
  • Audit all access to PHI
  • Use HIPAA-compliant infrastructure (AWS, GCP, Azure)
  • Sign BAAs with:
    • OpenAI (via Azure OpenAI)
    • Anthropic (Enterprise plan)
    • Google (Vertex AI)
    • AWS (Bedrock)
Configuration:
general_settings:
  database_url: postgresql://...?sslmode=require
  redact_messages_in_logs: true
  
litellm_settings:
  drop_params: true
  
router_settings:
  allowed_cache_controls: []  # No caching of PHI

Container Security

Base Image Security

LiteLLM uses Chainguard Wolfi base images for minimal attack surface:
# Security-focused base image
ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base

FROM $LITELLM_RUNTIME_IMAGE AS runtime

# Minimal dependencies
RUN apk add --no-cache bash openssl tzdata nodejs npm python3 py3-pip

# Non-root user (automatically handled)
USER nonroot

Security Scanning

Scan images for vulnerabilities:
# Trivy
trivy image ghcr.io/berriai/litellm:main-stable

# Grype
grype ghcr.io/berriai/litellm:main-stable

# Snyk
snyk container test ghcr.io/berriai/litellm:main-stable

Pod Security Standards (Kubernetes)

apiVersion: v1
kind: Pod
metadata:
  name: litellm
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  
  containers:
    - name: litellm
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: false  # Prisma needs write access
        capabilities:
          drop:
            - ALL

Monitoring and Incident Response

Security Monitoring

Monitor for suspicious activity:
-- Unusual spend patterns
SELECT 
  api_key,
  COUNT(*) as requests,
  SUM(spend) as total_spend
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW() - INTERVAL '1 hour'
GROUP BY api_key
HAVING SUM(spend) > 100
ORDER BY total_spend DESC;

-- Failed authentication attempts
SELECT 
  api_key,
  COUNT(*) as failures
FROM "LiteLLM_SpendLogs"
WHERE status = 'error'
  AND "startTime" >= NOW() - INTERVAL '1 hour'
GROUP BY api_key
HAVING COUNT(*) > 10;
Prometheus alerts:
- alert: SuspiciousActivity
  expr: rate(litellm_requests_total{status="error"}[5m]) > 10
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "High error rate from single source"

- alert: UnusualSpend
  expr: increase(litellm_spend_total[1h]) > 1000
  labels:
    severity: warning
  annotations:
    summary: "Unusual spend spike detected"

Incident Response Plan

1

Detection

  • Monitor alerts
  • Review audit logs
  • Check error rates
2

Containment

# Immediately block compromised key
curl -X POST http://localhost:4000/key/block \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"key": "sk-litellm-compromised"}'

# Revoke all team keys if needed
curl -X POST http://localhost:4000/team/block \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"team_id": "team-123"}'
3

Investigation

-- Trace all requests from compromised key
SELECT 
  request_id,
  model,
  "user",
  spend,
  "startTime",
  messages
FROM "LiteLLM_SpendLogs"
WHERE api_key = 'hashed-key'
ORDER BY "startTime" DESC;
4

Recovery

  • Rotate master key
  • Regenerate affected virtual keys
  • Update provider API keys
  • Patch vulnerabilities
5

Post-Mortem

  • Document incident
  • Update security policies
  • Improve monitoring
  • Train team

Security Checklist

1

Authentication

  • Strong master key generated
  • Virtual keys with rate limits
  • SSO enabled for admin UI
  • MFA for admin accounts
2

Network

  • HTTPS/TLS enabled
  • Firewall rules configured
  • Network policies in place
  • DDoS protection active
3

Data

  • Database encryption enabled
  • Secrets in environment/vault
  • PII redaction configured
  • Backup encryption enabled
4

Compliance

  • Audit logging enabled
  • Data retention policy set
  • BAAs signed with providers
  • Privacy policy updated
5

Monitoring

  • Security alerts configured
  • Audit log monitoring
  • Incident response plan
  • Regular security audits

Next Steps

Monitoring

Set up security monitoring and alerts

High Availability

Deploy securely at scale

Performance

Optimize without compromising security

Troubleshooting

Debug security-related issues

Build docs developers (and LLMs) love