Authentication and Authorization
Master Key
Always set a strong LITELLM_MASTER_KEY in production. This key has full admin access.
Generate a secure master key:
# Generate cryptographically secure key
openssl rand -hex 32
# Or use Python
python -c "import secrets; print('sk-' + secrets.token_hex(32))"
Set in environment:
LITELLM_MASTER_KEY=sk-a7f9c8e2d1b4f6a8c9e3d2b5f7a9c1e4d3b6f8a2c5e7d9b4f6a8c1e3d5b7f9a2
Or in config:
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
Virtual Keys
Create scoped API keys with rate limits and budgets:
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"key_alias": "frontend-app",
"models": ["gpt-4o", "claude-sonnet-4"],
"max_budget": 100.0,
"budget_duration": "30d",
"rpm_limit": 100,
"tpm_limit": 100000,
"team_id": "team-engineering",
"metadata": {
"environment": "production",
"app": "chat-widget"
}
}'
Response:
{
"key": "sk-litellm-a1b2c3d4...",
"key_name": "frontend-app",
"expires": null,
"models": ["gpt-4o", "claude-sonnet-4"],
"max_budget": 100.0,
"budget_duration": "30d"
}
Virtual keys are hashed before storage. The actual key is only shown once during creation.
Team-Based Access Control
Create teams with budgets:
curl -X POST http://localhost:4000/team/new \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"team_alias": "engineering",
"models": ["gpt-4o", "claude-sonnet-4"],
"max_budget": 1000.0,
"budget_duration": "30d",
"rpm_limit": 500,
"tpm_limit": 500000
}'
Assign users to teams:
curl -X POST http://localhost:4000/team/member/add \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team-123",
"user_id": "[email protected]",
"role": "member"
}'
SSO Integration
Google OAuth
Microsoft Azure AD
Okta
general_settings:
ui_access_mode: sso
sso_providers:
- provider: google
client_id: os.environ/GOOGLE_CLIENT_ID
client_secret: os.environ/GOOGLE_CLIENT_SECRET
allowed_domains:
- company.com
- subsidiary.com
Set environment variables:GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-secret
general_settings:
ui_access_mode: sso
sso_providers:
- provider: azure
client_id: os.environ/AZURE_CLIENT_ID
client_secret: os.environ/AZURE_CLIENT_SECRET
tenant_id: os.environ/AZURE_TENANT_ID
AZURE_CLIENT_ID=your-application-id
AZURE_CLIENT_SECRET=your-secret
AZURE_TENANT_ID=your-tenant-id
general_settings:
ui_access_mode: sso
sso_providers:
- provider: okta
client_id: os.environ/OKTA_CLIENT_ID
client_secret: os.environ/OKTA_CLIENT_SECRET
issuer: https://your-domain.okta.com
Secrets Management
Environment Variables
Never commit API keys to Git. Use environment variables or secrets management systems.
Reference in config.yaml:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY # ✅ Good
# api_key: sk-proj-xxx # ❌ Bad - never hardcode
- model_name: azure-gpt4
litellm_params:
model: azure/gpt-4
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
Kubernetes Secrets
Create secrets:
kubectl create secret generic litellm-secrets \
--from-literal=LITELLM_MASTER_KEY=sk-... \
--from-literal=OPENAI_API_KEY=sk-proj-... \
--from-literal=ANTHROPIC_API_KEY=sk-ant-... \
--from-literal=DATABASE_URL=postgresql://...
Mount in deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm
spec:
template:
spec:
containers:
- name: litellm
envFrom:
- secretRef:
name: litellm-secrets
# Or individual env vars
env:
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: litellm-secrets
key: LITELLM_MASTER_KEY
AWS Secrets Manager
Store secrets in AWS:
aws secretsmanager create-secret \
--name litellm/prod/api-keys \
--secret-string '{
"OPENAI_API_KEY": "sk-proj-...",
"ANTHROPIC_API_KEY": "sk-ant-..."
}'
Retrieve in application:
import boto3
import json
import os
def load_secrets():
client = boto3.client('secretsmanager', region_name='us-east-1')
secret = client.get_secret_value(SecretId='litellm/prod/api-keys')
secrets = json.loads(secret['SecretString'])
for key, value in secrets.items():
os.environ[key] = value
load_secrets()
HashiCorp Vault
general_settings:
secret_manager:
type: vault
vault_address: https://vault.company.com
vault_token: os.environ/VAULT_TOKEN
vault_mount_path: secret/litellm
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: vault/openai/api_key # Fetches from Vault
Network Security
TLS/SSL Configuration
Enable HTTPS in production:
NGINX Reverse Proxy
Traefik
Kubernetes Ingress
server {
listen 443 ssl http2;
server_name api.yourdomain.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
location / {
proxy_pass http://litellm:4000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Security headers
add_header X-Content-Type-Options nosniff;
add_header X-Frame-Options DENY;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
}
}
http:
routers:
litellm:
rule: "Host(`api.yourdomain.com`)"
tls:
certResolver: letsencrypt
service: litellm
services:
litellm:
loadBalancer:
servers:
- url: "http://litellm:4000"
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: litellm-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
tls:
- hosts:
- api.yourdomain.com
secretName: litellm-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: litellm
port:
number: 4000
Firewall Rules
Allow only necessary traffic:
# AWS Security Group
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp \
--port 443 \
--cidr 0.0.0.0/0 # HTTPS from anywhere
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp \
--port 5432 \
--source-group sg-litellm # PostgreSQL only from LiteLLM
GCP Firewall:
gcloud compute firewall-rules create allow-https \
--allow tcp:443 \
--source-ranges 0.0.0.0/0 \
--target-tags litellm
gcloud compute firewall-rules create allow-postgres \
--allow tcp:5432 \
--source-tags litellm \
--target-tags postgres
Network Policies (Kubernetes)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: litellm-netpol
spec:
podSelector:
matchLabels:
app: litellm
policyTypes:
- Ingress
- Egress
ingress:
# Allow from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 4000
egress:
# Allow to database
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
# Allow to Redis
- to:
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
# Allow external API calls (OpenAI, Anthropic, etc.)
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
Rate Limiting and DDoS Protection
Built-in Rate Limiting
Global rate limits:
general_settings:
max_parallel_requests: 100 # Concurrent requests
global_max_parallel_requests: 1000
router_settings:
rpm_limit: 10000 # Requests per minute
tpm_limit: 1000000 # Tokens per minute
Per-key rate limits:
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"rpm_limit": 100,
"tpm_limit": 100000,
"max_parallel_requests": 10
}'
Per-team rate limits:
curl -X POST http://localhost:4000/team/new \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"team_alias": "engineering",
"rpm_limit": 1000,
"tpm_limit": 1000000
}'
NGINX Rate Limiting
http {
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/s;
limit_conn_zone $binary_remote_addr zone=addr:10m;
server {
location / {
limit_req zone=api_limit burst=20 nodelay;
limit_conn addr 10;
proxy_pass http://litellm:4000;
}
}
}
Cloudflare Protection
Use Cloudflare (or similar CDN) for DDoS protection and rate limiting.
Benefits:
- DDoS mitigation
- WAF (Web Application Firewall)
- Rate limiting at edge
- Bot detection
- Geographic restrictions
Data Protection
Database Encryption
Enable PostgreSQL SSL:
general_settings:
database_url: postgresql://user:pass@host:5432/db?sslmode=require
Encryption at rest:
# AWS RDS
aws rds create-db-instance \
--db-instance-identifier litellm-db \
--storage-encrypted \
--kms-key-id arn:aws:kms:us-east-1:123456789:key/xxx
# GCP Cloud SQL
gcloud sql instances create litellm-db \
--database-version=POSTGRES_16 \
--tier=db-n1-standard-2 \
--encryption-key=projects/PROJECT/locations/LOCATION/keyRings/RING/cryptoKeys/KEY
Redacting Sensitive Data
Mask PII in logs:
general_settings:
redact_user_api_key_info: true # Don't log API keys
redact_messages_in_logs: true # Don't log message content
litellm_settings:
drop_params: true # Drop request params from logs
router_settings:
allowed_cache_controls: [] # Disable caching sensitive data
Request/Response Encryption
End-to-end encryption:
from cryptography.fernet import Fernet
import openai
key = Fernet.generate_key()
cipher = Fernet(key)
# Encrypt sensitive data before sending
message = "Confidential information"
encrypted = cipher.encrypt(message.encode())
# Send to LiteLLM
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[{"role": "user", "content": encrypted.decode()}]
)
# Decrypt response
decrypted = cipher.decrypt(response.choices[0].message.content.encode())
Compliance
GDPR Compliance
Data Minimization
Only store necessary data:general_settings:
store_model_in_db: true
store_audit_logs: true
log_retention_days: 90 # Automatic cleanup
Right to Deletion
Implement user data deletion:# Delete user data
curl -X DELETE http://localhost:4000/user/delete \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"user_id": "[email protected]"}'
This removes:
- User account
- API keys
- Spend logs (or anonymizes)
- Personal metadata
Data Export
Allow users to export their data:# Export user data
curl http://localhost:4000/user/[email protected] \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
> user_data.json
Consent Management
Track user consent:# Add consent metadata
client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_body={
"metadata": {
"user_consent": "analytics:true,storage:true",
"consent_date": "2024-01-01"
}
}
)
SOC 2 / ISO 27001
Audit logging:
-- View audit trail
SELECT
updated_at,
changed_by,
action,
table_name,
object_id,
before_value,
updated_values
FROM "LiteLLM_AuditLog"
ORDER BY updated_at DESC;
Access logs:
general_settings:
detailed_debug: true # Detailed access logs
litellm_settings:
success_callback: ["langfuse", "prometheus"]
failure_callback: ["langfuse", "sentry"]
HIPAA Compliance
For HIPAA compliance, ensure BAA (Business Associate Agreement) with all LLM providers.
Requirements:
- Enable encryption at rest and in transit
- Implement access controls (RBAC)
- Audit all access to PHI
- Use HIPAA-compliant infrastructure (AWS, GCP, Azure)
- Sign BAAs with:
- OpenAI (via Azure OpenAI)
- Anthropic (Enterprise plan)
- Google (Vertex AI)
- AWS (Bedrock)
Configuration:
general_settings:
database_url: postgresql://...?sslmode=require
redact_messages_in_logs: true
litellm_settings:
drop_params: true
router_settings:
allowed_cache_controls: [] # No caching of PHI
Container Security
Base Image Security
LiteLLM uses Chainguard Wolfi base images for minimal attack surface:
# Security-focused base image
ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base
FROM $LITELLM_RUNTIME_IMAGE AS runtime
# Minimal dependencies
RUN apk add --no-cache bash openssl tzdata nodejs npm python3 py3-pip
# Non-root user (automatically handled)
USER nonroot
Security Scanning
Scan images for vulnerabilities:
# Trivy
trivy image ghcr.io/berriai/litellm:main-stable
# Grype
grype ghcr.io/berriai/litellm:main-stable
# Snyk
snyk container test ghcr.io/berriai/litellm:main-stable
Pod Security Standards (Kubernetes)
apiVersion: v1
kind: Pod
metadata:
name: litellm
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: litellm
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false # Prisma needs write access
capabilities:
drop:
- ALL
Monitoring and Incident Response
Security Monitoring
Monitor for suspicious activity:
-- Unusual spend patterns
SELECT
api_key,
COUNT(*) as requests,
SUM(spend) as total_spend
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW() - INTERVAL '1 hour'
GROUP BY api_key
HAVING SUM(spend) > 100
ORDER BY total_spend DESC;
-- Failed authentication attempts
SELECT
api_key,
COUNT(*) as failures
FROM "LiteLLM_SpendLogs"
WHERE status = 'error'
AND "startTime" >= NOW() - INTERVAL '1 hour'
GROUP BY api_key
HAVING COUNT(*) > 10;
Prometheus alerts:
- alert: SuspiciousActivity
expr: rate(litellm_requests_total{status="error"}[5m]) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate from single source"
- alert: UnusualSpend
expr: increase(litellm_spend_total[1h]) > 1000
labels:
severity: warning
annotations:
summary: "Unusual spend spike detected"
Incident Response Plan
Detection
- Monitor alerts
- Review audit logs
- Check error rates
Containment
# Immediately block compromised key
curl -X POST http://localhost:4000/key/block \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"key": "sk-litellm-compromised"}'
# Revoke all team keys if needed
curl -X POST http://localhost:4000/team/block \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"team_id": "team-123"}'
Investigation
-- Trace all requests from compromised key
SELECT
request_id,
model,
"user",
spend,
"startTime",
messages
FROM "LiteLLM_SpendLogs"
WHERE api_key = 'hashed-key'
ORDER BY "startTime" DESC;
Recovery
- Rotate master key
- Regenerate affected virtual keys
- Update provider API keys
- Patch vulnerabilities
Post-Mortem
- Document incident
- Update security policies
- Improve monitoring
- Train team
Security Checklist
Next Steps
Monitoring
Set up security monitoring and alerts
High Availability
Deploy securely at scale
Performance
Optimize without compromising security
Troubleshooting
Debug security-related issues