Kubernetes Deployment

Quick Start with Helm

LiteLLM provides official Helm charts for Kubernetes deployment.

Add Helm Repository

helm repo add litellm https://charts.litellm.ai
helm repo update

Install with Default Values

helm install litellm litellm/litellm-helm \
  --set postgresql.auth.password=your-secure-password \
  --set postgresql.auth.postgres-password=your-admin-password

Verify Installation

kubectl get pods -l app.kubernetes.io/name=litellm
kubectl logs -l app.kubernetes.io/name=litellm -f

Access the Service

# Port forward to access locally
kubectl port-forward svc/litellm 4000:4000

# Test health endpoint
curl http://localhost:4000/health/liveliness

Helm Chart Configuration

Basic Values

Create a values.yaml file:

values.yaml

# Number of replicas
replicaCount: 3

# Image configuration
image:
  repository: ghcr.io/berriai/litellm-database
  tag: "main-stable"
  pullPolicy: Always

# Service configuration
service:
  type: ClusterIP
  port: 4000

# Resource limits
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

# LiteLLM configuration
proxy_config:
  model_list:
    - model_name: gpt-4o
      litellm_params:
        model: gpt-4o
        api_key: os.environ/OPENAI_API_KEY
    - model_name: claude-sonnet-4
      litellm_params:
        model: anthropic/claude-sonnet-4-20250514
        api_key: os.environ/ANTHROPIC_API_KEY
  
  general_settings:
    master_key: os.environ/PROXY_MASTER_KEY
    database_url: os.environ/DATABASE_URL

# Database configuration
db:
  deployStandalone: true  # Deploy PostgreSQL with chart
  useExisting: false      # Or use existing database

postgresql:
  architecture: standalone
  auth:
    username: litellm
    database: litellm
    password: "ChangeMe123!"  # Override via --set
    postgres-password: "AdminPass123!"  # Override via --set

# Enable autoscaling
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

Install with custom values:

helm install litellm litellm/litellm-helm -f values.yaml

Environment Variables from Secrets

Store API keys and sensitive data in Kubernetes Secrets, not in values.yaml.

Create secrets:

kubectl create secret generic litellm-secrets \
  --from-literal=OPENAI_API_KEY=sk-... \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-... \
  --from-literal=PROXY_MASTER_KEY=sk-1234

Reference in values.yaml:

environmentSecrets:
  - litellm-secrets

proxy_config:
  model_list:
    - model_name: gpt-4o
      litellm_params:
        model: gpt-4o
        api_key: os.environ/OPENAI_API_KEY
  general_settings:
    master_key: os.environ/PROXY_MASTER_KEY

Manual Kubernetes Deployment

For custom deployments without Helm:

Deployment YAML

litellm-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
        - name: litellm-container
          image: ghcr.io/berriai/litellm:main-stable
          imagePullPolicy: Always
          ports:
            - containerPort: 4000
              name: http
          env:
            - name: LITELLM_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: master-key
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: database-url
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: openai-api-key
          args:
            - "--config"
            - "/app/proxy_config.yaml"
          volumeMounts:
            - name: config-volume
              mountPath: /app
              readOnly: true
          livenessProbe:
            httpGet:
              path: /health/liveliness
              port: 4000
            initialDelaySeconds: 120
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/readiness
              port: 4000
            initialDelaySeconds: 120
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
            timeoutSeconds: 10
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: 2000m
              memory: 2Gi
      volumes:
        - name: config-volume
          configMap:
            name: litellm-config
---
apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000
  type: ClusterIP

Apply:

kubectl apply -f litellm-deployment.yaml

ConfigMap for Proxy Config

litellm-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config
data:
  proxy_config.yaml: |
    model_list:
      - model_name: gpt-4o
        litellm_params:
          model: gpt-4o
          api_key: os.environ/OPENAI_API_KEY
      - model_name: claude-sonnet-4
        litellm_params:
          model: anthropic/claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
    
    general_settings:
      master_key: os.environ/LITELLM_MASTER_KEY
      database_url: os.environ/DATABASE_URL

Apply:

kubectl apply -f litellm-configmap.yaml

Ingress Configuration

NGINX Ingress

litellm-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: litellm-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
  tls:
    - hosts:
        - api.yourdomain.com
      secretName: litellm-tls
  rules:
    - host: api.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: litellm-service
                port:
                  number: 4000

Enable in Helm:

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: litellm-tls
      hosts:
        - api.yourdomain.com

Autoscaling

Horizontal Pod Autoscaler (HPA)

litellm-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: litellm-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: litellm-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

KEDA Autoscaling

Use KEDA for advanced autoscaling based on custom metrics like request queue depth or Prometheus metrics.

keda:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: litellm_requests_total
        threshold: '1000'
        query: sum(rate(litellm_requests_total[2m]))

Database Configuration

Using External PostgreSQL

For production, use a managed database service (AWS RDS, GCP Cloud SQL, Azure Database) for better reliability.

db:
  useExisting: true
  endpoint: postgres.example.com
  database: litellm
  url: postgresql://$(DATABASE_USERNAME):$(DATABASE_PASSWORD)@$(DATABASE_HOST)/$(DATABASE_NAME)
  secret:
    name: postgres-credentials
    usernameKey: username
    passwordKey: password

# Disable bundled PostgreSQL
postgresql:
  enabled: false

Create database secret:

kubectl create secret generic postgres-credentials \
  --from-literal=username=litellm \
  --from-literal=password=your-secure-password

Prisma Migrations

The Helm chart includes a migration job that runs before deployment:

migrationJob:
  enabled: true
  retries: 3
  backoffLimit: 4
  ttlSecondsAfterFinished: 120
  hooks:
    argocd:
      enabled: true  # Run as ArgoCD hook
    helm:
      enabled: false # Or as Helm pre-install hook

High Availability Setup

Pod Disruption Budget

pdb:
  enabled: true
  minAvailable: 2  # Ensure at least 2 pods always running
  # Or use: maxUnavailable: 1

Topology Spread Constraints

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: litellm
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: litellm

Graceful Shutdown

terminationGracePeriodSeconds: 90

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]  # Drain requests

Monitoring with Prometheus

ServiceMonitor

serviceMonitor:
  enabled: true
  interval: 15s
  scrapeTimeout: 10s
  labels:
    prometheus: kube-prometheus

This creates a ServiceMonitor for the Prometheus Operator to scrape metrics from LiteLLM.

Redis for Caching

redis:
  enabled: true
  architecture: standalone
  auth:
    enabled: true
    password: "your-redis-password"

proxy_config:
  general_settings:
    cache: true
    redis_host: os.environ/REDIS_HOST
    redis_port: os.environ/REDIS_PORT
    redis_password: os.environ/REDIS_PASSWORD

Security Best Practices

Network Policies

networkpolicy.yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: litellm-netpol
spec:
  podSelector:
    matchLabels:
      app: litellm
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 4000
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgresql
      ports:
        - protocol: TCP
          port: 5432
    - to:  # Allow external API calls
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443

Pod Security Standards

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: false  # Prisma needs write access

Troubleshooting

Pod Won’t Start

# Check pod status
kubectl get pods
kubectl describe pod litellm-xxx

# View logs
kubectl logs litellm-xxx -f

# Common issues:
# - Database migrations failing (check migration job logs)
# - ConfigMap not mounted (verify configmap exists)
# - Secrets missing (check secret creation)

Database Connection Issues

# Test database connectivity from pod
kubectl exec -it litellm-xxx -- sh
psql $DATABASE_URL

# Check service DNS resolution
kubectl exec -it litellm-xxx -- nslookup postgres

Health Check Failures

# Manually test health endpoint
kubectl exec -it litellm-xxx -- curl localhost:4000/health/liveliness

# Increase startup probe failure threshold
startupProbe:
  failureThreshold: 30  # Allow 5 minutes (30 * 10s)
  periodSeconds: 10

Next Steps

High Availability

Multi-region HA deployment patterns

Monitoring

Set up Prometheus and Grafana dashboards

Security

Harden your Kubernetes deployment

Performance

Optimize for high throughput

Deploy

Production

Kubernetes Deployment

Quick Start with Helm

Helm Chart Configuration

Basic Values

Environment Variables from Secrets

Manual Kubernetes Deployment

Deployment YAML

ConfigMap for Proxy Config

Ingress Configuration

NGINX Ingress

Autoscaling

Horizontal Pod Autoscaler (HPA)

KEDA Autoscaling

Database Configuration

Using External PostgreSQL

Prisma Migrations

High Availability Setup

Pod Disruption Budget

Topology Spread Constraints

Graceful Shutdown

Monitoring with Prometheus

ServiceMonitor

Redis for Caching

Security Best Practices

Network Policies

Pod Security Standards

Troubleshooting

Pod Won’t Start

Database Connection Issues

Health Check Failures

Next Steps

High Availability

Monitoring

Security

Performance

Build docs developers (and LLMs) love

Deploy

Production

​Quick Start with Helm

​Helm Chart Configuration

​Basic Values

​Environment Variables from Secrets

​Manual Kubernetes Deployment

​Deployment YAML

​ConfigMap for Proxy Config

​Ingress Configuration

​NGINX Ingress

​Autoscaling

​Horizontal Pod Autoscaler (HPA)

​KEDA Autoscaling

​Database Configuration

​Using External PostgreSQL

​Prisma Migrations

​High Availability Setup

​Pod Disruption Budget

​Topology Spread Constraints

​Graceful Shutdown

​Monitoring with Prometheus

​ServiceMonitor

​Redis for Caching

​Security Best Practices

​Network Policies

​Pod Security Standards

​Troubleshooting

​Pod Won’t Start

​Database Connection Issues

​Health Check Failures

​Next Steps

High Availability

Monitoring

Security

Performance

Build docs developers (and LLMs) love

Quick Start with Helm

Helm Chart Configuration

Basic Values

Environment Variables from Secrets

Manual Kubernetes Deployment

Deployment YAML

ConfigMap for Proxy Config

Ingress Configuration

NGINX Ingress

Autoscaling

Horizontal Pod Autoscaler (HPA)

KEDA Autoscaling

Database Configuration

Using External PostgreSQL

Prisma Migrations

High Availability Setup

Pod Disruption Budget

Topology Spread Constraints

Graceful Shutdown

Monitoring with Prometheus

ServiceMonitor

Redis for Caching

Security Best Practices

Network Policies

Pod Security Standards

Troubleshooting

Pod Won’t Start

Database Connection Issues

Health Check Failures

Next Steps