Quick Start with Helm
LiteLLM provides official Helm charts for Kubernetes deployment.Install with Default Values
helm install litellm litellm/litellm-helm \
--set postgresql.auth.password=your-secure-password \
--set postgresql.auth.postgres-password=your-admin-password
Verify Installation
kubectl get pods -l app.kubernetes.io/name=litellm
kubectl logs -l app.kubernetes.io/name=litellm -f
Helm Chart Configuration
Basic Values
Create avalues.yaml file:
values.yaml
# Number of replicas
replicaCount: 3
# Image configuration
image:
repository: ghcr.io/berriai/litellm-database
tag: "main-stable"
pullPolicy: Always
# Service configuration
service:
type: ClusterIP
port: 4000
# Resource limits
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
# LiteLLM configuration
proxy_config:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/PROXY_MASTER_KEY
database_url: os.environ/DATABASE_URL
# Database configuration
db:
deployStandalone: true # Deploy PostgreSQL with chart
useExisting: false # Or use existing database
postgresql:
architecture: standalone
auth:
username: litellm
database: litellm
password: "ChangeMe123!" # Override via --set
postgres-password: "AdminPass123!" # Override via --set
# Enable autoscaling
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
helm install litellm litellm/litellm-helm -f values.yaml
Environment Variables from Secrets
Store API keys and sensitive data in Kubernetes Secrets, not in values.yaml.
kubectl create secret generic litellm-secrets \
--from-literal=OPENAI_API_KEY=sk-... \
--from-literal=ANTHROPIC_API_KEY=sk-ant-... \
--from-literal=PROXY_MASTER_KEY=sk-1234
values.yaml:
environmentSecrets:
- litellm-secrets
proxy_config:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
general_settings:
master_key: os.environ/PROXY_MASTER_KEY
Manual Kubernetes Deployment
For custom deployments without Helm:Deployment YAML
litellm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: litellm-deployment
spec:
replicas: 3
selector:
matchLabels:
app: litellm
template:
metadata:
labels:
app: litellm
spec:
containers:
- name: litellm-container
image: ghcr.io/berriai/litellm:main-stable
imagePullPolicy: Always
ports:
- containerPort: 4000
name: http
env:
- name: LITELLM_MASTER_KEY
valueFrom:
secretKeyRef:
name: litellm-secrets
key: master-key
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: litellm-secrets
key: database-url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: litellm-secrets
key: openai-api-key
args:
- "--config"
- "/app/proxy_config.yaml"
volumeMounts:
- name: config-volume
mountPath: /app
readOnly: true
livenessProbe:
httpGet:
path: /health/liveliness
port: 4000
initialDelaySeconds: 120
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 10
readinessProbe:
httpGet:
path: /health/readiness
port: 4000
initialDelaySeconds: 120
periodSeconds: 15
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 10
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
volumes:
- name: config-volume
configMap:
name: litellm-config
---
apiVersion: v1
kind: Service
metadata:
name: litellm-service
spec:
selector:
app: litellm
ports:
- protocol: TCP
port: 4000
targetPort: 4000
type: ClusterIP
kubectl apply -f litellm-deployment.yaml
ConfigMap for Proxy Config
litellm-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: litellm-config
data:
proxy_config.yaml: |
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
kubectl apply -f litellm-configmap.yaml
Ingress Configuration
NGINX Ingress
litellm-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: litellm-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
tls:
- hosts:
- api.yourdomain.com
secretName: litellm-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: litellm-service
port:
number: 4000
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.yourdomain.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: litellm-tls
hosts:
- api.yourdomain.com
Autoscaling
Horizontal Pod Autoscaler (HPA)
litellm-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: litellm-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: litellm-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
KEDA Autoscaling
Use KEDA for advanced autoscaling based on custom metrics like request queue depth or Prometheus metrics.
keda:
enabled: true
minReplicas: 2
maxReplicas: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: litellm_requests_total
threshold: '1000'
query: sum(rate(litellm_requests_total[2m]))
Database Configuration
Using External PostgreSQL
For production, use a managed database service (AWS RDS, GCP Cloud SQL, Azure Database) for better reliability.
db:
useExisting: true
endpoint: postgres.example.com
database: litellm
url: postgresql://$(DATABASE_USERNAME):$(DATABASE_PASSWORD)@$(DATABASE_HOST)/$(DATABASE_NAME)
secret:
name: postgres-credentials
usernameKey: username
passwordKey: password
# Disable bundled PostgreSQL
postgresql:
enabled: false
kubectl create secret generic postgres-credentials \
--from-literal=username=litellm \
--from-literal=password=your-secure-password
Prisma Migrations
The Helm chart includes a migration job that runs before deployment:migrationJob:
enabled: true
retries: 3
backoffLimit: 4
ttlSecondsAfterFinished: 120
hooks:
argocd:
enabled: true # Run as ArgoCD hook
helm:
enabled: false # Or as Helm pre-install hook
High Availability Setup
Pod Disruption Budget
pdb:
enabled: true
minAvailable: 2 # Ensure at least 2 pods always running
# Or use: maxUnavailable: 1
Topology Spread Constraints
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: litellm
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: litellm
Graceful Shutdown
terminationGracePeriodSeconds: 90
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"] # Drain requests
Monitoring with Prometheus
ServiceMonitor
serviceMonitor:
enabled: true
interval: 15s
scrapeTimeout: 10s
labels:
prometheus: kube-prometheus
Redis for Caching
redis:
enabled: true
architecture: standalone
auth:
enabled: true
password: "your-redis-password"
proxy_config:
general_settings:
cache: true
redis_host: os.environ/REDIS_HOST
redis_port: os.environ/REDIS_PORT
redis_password: os.environ/REDIS_PASSWORD
Security Best Practices
Network Policies
networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: litellm-netpol
spec:
podSelector:
matchLabels:
app: litellm
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 4000
egress:
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
- to: # Allow external API calls
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
Pod Security Standards
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false # Prisma needs write access
Troubleshooting
Pod Won’t Start
# Check pod status
kubectl get pods
kubectl describe pod litellm-xxx
# View logs
kubectl logs litellm-xxx -f
# Common issues:
# - Database migrations failing (check migration job logs)
# - ConfigMap not mounted (verify configmap exists)
# - Secrets missing (check secret creation)
Database Connection Issues
# Test database connectivity from pod
kubectl exec -it litellm-xxx -- sh
psql $DATABASE_URL
# Check service DNS resolution
kubectl exec -it litellm-xxx -- nslookup postgres
Health Check Failures
# Manually test health endpoint
kubectl exec -it litellm-xxx -- curl localhost:4000/health/liveliness
# Increase startup probe failure threshold
startupProbe:
failureThreshold: 30 # Allow 5 minutes (30 * 10s)
periodSeconds: 10
Next Steps
High Availability
Multi-region HA deployment patterns
Monitoring
Set up Prometheus and Grafana dashboards
Security
Harden your Kubernetes deployment
Performance
Optimize for high throughput