Skip to main content

Quick Start with Helm

LiteLLM provides official Helm charts for Kubernetes deployment.
1

Add Helm Repository

helm repo add litellm https://charts.litellm.ai
helm repo update
2

Install with Default Values

helm install litellm litellm/litellm-helm \
  --set postgresql.auth.password=your-secure-password \
  --set postgresql.auth.postgres-password=your-admin-password
3

Verify Installation

kubectl get pods -l app.kubernetes.io/name=litellm
kubectl logs -l app.kubernetes.io/name=litellm -f
4

Access the Service

# Port forward to access locally
kubectl port-forward svc/litellm 4000:4000

# Test health endpoint
curl http://localhost:4000/health/liveliness

Helm Chart Configuration

Basic Values

Create a values.yaml file:
values.yaml
# Number of replicas
replicaCount: 3

# Image configuration
image:
  repository: ghcr.io/berriai/litellm-database
  tag: "main-stable"
  pullPolicy: Always

# Service configuration
service:
  type: ClusterIP
  port: 4000

# Resource limits
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

# LiteLLM configuration
proxy_config:
  model_list:
    - model_name: gpt-4o
      litellm_params:
        model: gpt-4o
        api_key: os.environ/OPENAI_API_KEY
    - model_name: claude-sonnet-4
      litellm_params:
        model: anthropic/claude-sonnet-4-20250514
        api_key: os.environ/ANTHROPIC_API_KEY
  
  general_settings:
    master_key: os.environ/PROXY_MASTER_KEY
    database_url: os.environ/DATABASE_URL

# Database configuration
db:
  deployStandalone: true  # Deploy PostgreSQL with chart
  useExisting: false      # Or use existing database

postgresql:
  architecture: standalone
  auth:
    username: litellm
    database: litellm
    password: "ChangeMe123!"  # Override via --set
    postgres-password: "AdminPass123!"  # Override via --set

# Enable autoscaling
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80
Install with custom values:
helm install litellm litellm/litellm-helm -f values.yaml

Environment Variables from Secrets

Store API keys and sensitive data in Kubernetes Secrets, not in values.yaml.
Create secrets:
kubectl create secret generic litellm-secrets \
  --from-literal=OPENAI_API_KEY=sk-... \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-... \
  --from-literal=PROXY_MASTER_KEY=sk-1234
Reference in values.yaml:
environmentSecrets:
  - litellm-secrets

proxy_config:
  model_list:
    - model_name: gpt-4o
      litellm_params:
        model: gpt-4o
        api_key: os.environ/OPENAI_API_KEY
  general_settings:
    master_key: os.environ/PROXY_MASTER_KEY

Manual Kubernetes Deployment

For custom deployments without Helm:

Deployment YAML

litellm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litellm-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: litellm
  template:
    metadata:
      labels:
        app: litellm
    spec:
      containers:
        - name: litellm-container
          image: ghcr.io/berriai/litellm:main-stable
          imagePullPolicy: Always
          ports:
            - containerPort: 4000
              name: http
          env:
            - name: LITELLM_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: master-key
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: database-url
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: litellm-secrets
                  key: openai-api-key
          args:
            - "--config"
            - "/app/proxy_config.yaml"
          volumeMounts:
            - name: config-volume
              mountPath: /app
              readOnly: true
          livenessProbe:
            httpGet:
              path: /health/liveliness
              port: 4000
            initialDelaySeconds: 120
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /health/readiness
              port: 4000
            initialDelaySeconds: 120
            periodSeconds: 15
            successThreshold: 1
            failureThreshold: 3
            timeoutSeconds: 10
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: 2000m
              memory: 2Gi
      volumes:
        - name: config-volume
          configMap:
            name: litellm-config
---
apiVersion: v1
kind: Service
metadata:
  name: litellm-service
spec:
  selector:
    app: litellm
  ports:
    - protocol: TCP
      port: 4000
      targetPort: 4000
  type: ClusterIP
Apply:
kubectl apply -f litellm-deployment.yaml

ConfigMap for Proxy Config

litellm-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-config
data:
  proxy_config.yaml: |
    model_list:
      - model_name: gpt-4o
        litellm_params:
          model: gpt-4o
          api_key: os.environ/OPENAI_API_KEY
      - model_name: claude-sonnet-4
        litellm_params:
          model: anthropic/claude-sonnet-4-20250514
          api_key: os.environ/ANTHROPIC_API_KEY
    
    general_settings:
      master_key: os.environ/LITELLM_MASTER_KEY
      database_url: os.environ/DATABASE_URL
Apply:
kubectl apply -f litellm-configmap.yaml

Ingress Configuration

NGINX Ingress

litellm-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: litellm-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
spec:
  tls:
    - hosts:
        - api.yourdomain.com
      secretName: litellm-tls
  rules:
    - host: api.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: litellm-service
                port:
                  number: 4000
Enable in Helm:
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: litellm-tls
      hosts:
        - api.yourdomain.com

Autoscaling

Horizontal Pod Autoscaler (HPA)

litellm-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: litellm-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: litellm-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

KEDA Autoscaling

Use KEDA for advanced autoscaling based on custom metrics like request queue depth or Prometheus metrics.
keda:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: litellm_requests_total
        threshold: '1000'
        query: sum(rate(litellm_requests_total[2m]))

Database Configuration

Using External PostgreSQL

For production, use a managed database service (AWS RDS, GCP Cloud SQL, Azure Database) for better reliability.
db:
  useExisting: true
  endpoint: postgres.example.com
  database: litellm
  url: postgresql://$(DATABASE_USERNAME):$(DATABASE_PASSWORD)@$(DATABASE_HOST)/$(DATABASE_NAME)
  secret:
    name: postgres-credentials
    usernameKey: username
    passwordKey: password

# Disable bundled PostgreSQL
postgresql:
  enabled: false
Create database secret:
kubectl create secret generic postgres-credentials \
  --from-literal=username=litellm \
  --from-literal=password=your-secure-password

Prisma Migrations

The Helm chart includes a migration job that runs before deployment:
migrationJob:
  enabled: true
  retries: 3
  backoffLimit: 4
  ttlSecondsAfterFinished: 120
  hooks:
    argocd:
      enabled: true  # Run as ArgoCD hook
    helm:
      enabled: false # Or as Helm pre-install hook

High Availability Setup

Pod Disruption Budget

pdb:
  enabled: true
  minAvailable: 2  # Ensure at least 2 pods always running
  # Or use: maxUnavailable: 1

Topology Spread Constraints

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: litellm
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app: litellm

Graceful Shutdown

terminationGracePeriodSeconds: 90

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]  # Drain requests

Monitoring with Prometheus

ServiceMonitor

serviceMonitor:
  enabled: true
  interval: 15s
  scrapeTimeout: 10s
  labels:
    prometheus: kube-prometheus
This creates a ServiceMonitor for the Prometheus Operator to scrape metrics from LiteLLM.

Redis for Caching

redis:
  enabled: true
  architecture: standalone
  auth:
    enabled: true
    password: "your-redis-password"

proxy_config:
  general_settings:
    cache: true
    redis_host: os.environ/REDIS_HOST
    redis_port: os.environ/REDIS_PORT
    redis_password: os.environ/REDIS_PASSWORD

Security Best Practices

Network Policies

networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: litellm-netpol
spec:
  podSelector:
    matchLabels:
      app: litellm
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 4000
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgresql
      ports:
        - protocol: TCP
          port: 5432
    - to:  # Allow external API calls
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443

Pod Security Standards

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  seccompProfile:
    type: RuntimeDefault

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: false  # Prisma needs write access

Troubleshooting

Pod Won’t Start

# Check pod status
kubectl get pods
kubectl describe pod litellm-xxx

# View logs
kubectl logs litellm-xxx -f

# Common issues:
# - Database migrations failing (check migration job logs)
# - ConfigMap not mounted (verify configmap exists)
# - Secrets missing (check secret creation)

Database Connection Issues

# Test database connectivity from pod
kubectl exec -it litellm-xxx -- sh
psql $DATABASE_URL

# Check service DNS resolution
kubectl exec -it litellm-xxx -- nslookup postgres

Health Check Failures

# Manually test health endpoint
kubectl exec -it litellm-xxx -- curl localhost:4000/health/liveliness

# Increase startup probe failure threshold
startupProbe:
  failureThreshold: 30  # Allow 5 minutes (30 * 10s)
  periodSeconds: 10

Next Steps

High Availability

Multi-region HA deployment patterns

Monitoring

Set up Prometheus and Grafana dashboards

Security

Harden your Kubernetes deployment

Performance

Optimize for high throughput

Build docs developers (and LLMs) love