Skip to main content

Overview

LiteLLM provides built-in observability through:
  • Prometheus metrics - Request rates, latency, errors
  • Logging integrations - Langfuse, Datadog, OpenTelemetry
  • Database tracking - Spend logs, usage analytics
  • Health checks - Service and model health monitoring

Prometheus Metrics

Enable Metrics Endpoint

LiteLLM exposes Prometheus metrics at /metrics:
curl http://localhost:4000/metrics
Available metrics:
# Request metrics
litellm_requests_total{model="gpt-4o",status="success"}
litellm_request_duration_seconds{model="gpt-4o"}

# Token metrics
litellm_tokens_total{model="gpt-4o",type="prompt"}
litellm_tokens_total{model="gpt-4o",type="completion"}

# Cost metrics
litellm_spend_total{model="gpt-4o",team="engineering"}

# Model health
litellm_model_health_status{model="gpt-4o",status="healthy"}

# Rate limiting
litellm_rate_limit_remaining{key="sk-123",limit_type="rpm"}

Prometheus Configuration

Create prometheus.yml:
prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'litellm-production'

scrape_configs:
  - job_name: 'litellm'
    static_configs:
      - targets: ['litellm:4000']
    metrics_path: '/metrics'
    scrape_interval: 15s
    scrape_timeout: 10s

Docker Compose with Prometheus

docker-compose.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-stable
    ports:
      - "4000:4000"
    environment:
      DATABASE_URL: postgresql://llmproxy:password@db:5432/litellm
      LITELLM_MASTER_KEY: sk-1234
    depends_on:
      - db
      - prometheus

  db:
    image: postgres:16
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: password
    volumes:
      - postgres_data:/var/lib/postgresql/data

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
      - '--web.enable-lifecycle'

volumes:
  postgres_data:
  prometheus_data:
Start:
docker-compose up -d
Access Prometheus UI: http://localhost:9090

Grafana Dashboards

Setup Grafana

Add to docker-compose.yml:
grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  environment:
    GF_SECURITY_ADMIN_PASSWORD: admin
    GF_USERS_ALLOW_SIGN_UP: false
  volumes:
    - grafana_data:/var/lib/grafana
    - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
    - ./grafana/datasources:/etc/grafana/provisioning/datasources
  depends_on:
    - prometheus

Configure Data Source

Create grafana/datasources/prometheus.yml:
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true

Create Dashboard

Key panels to include:
# Requests per second
rate(litellm_requests_total[5m])

# By model
sum(rate(litellm_requests_total[5m])) by (model)

# Success rate
sum(rate(litellm_requests_total{status="success"}[5m])) /
sum(rate(litellm_requests_total[5m])) * 100

Import Pre-built Dashboard

LiteLLM provides a Grafana dashboard JSON:
  1. Download from LiteLLM repository
  2. In Grafana: DashboardsImport → Upload JSON
  3. Select Prometheus data source

Logging Integrations

Langfuse

Langfuse provides detailed LLM observability with traces, costs, and user analytics.
Setup:
config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
Environment variables:
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
Features:
  • Request/response traces
  • Token usage and cost tracking
  • User session analytics
  • Model performance comparison
  • Custom metadata tags

Datadog

Enable Datadog tracing:
# Environment variables
USE_DDTRACE=true
DD_API_KEY=your-datadog-api-key
DD_SITE=datadoghq.com
DD_SERVICE=litellm-proxy
DD_ENV=production
DD_VERSION=1.0.0
DD_TRACE_OPENAI_ENABLED=false
Docker with Datadog Agent:
docker-compose.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-stable
    environment:
      USE_DDTRACE: "true"
      DD_AGENT_HOST: datadog-agent
      DD_TRACE_AGENT_PORT: 8126
    depends_on:
      - datadog-agent

  datadog-agent:
    image: datadog/agent:latest
    environment:
      DD_API_KEY: ${DD_API_KEY}
      DD_SITE: datadoghq.com
      DD_APM_ENABLED: "true"
      DD_APM_NON_LOCAL_TRAFFIC: "true"
    ports:
      - "8126:8126"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro

OpenTelemetry

Configure OTEL export:
config.yaml
general_settings:
  otel: true
  otel_exporter: otlp_http
  otel_endpoint: http://otel-collector:4318
  otel_headers:
    Authorization: Bearer your-token

litellm_settings:
  success_callback: ["otel"]
Docker with OTEL Collector:
docker-compose.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-stable
    environment:
      OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4318
      OTEL_SERVICE_NAME: litellm-proxy

  otel-collector:
    image: otel/opentelemetry-collector:latest
    ports:
      - "4318:4318"
      - "4317:4317"
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    command: ["--config=/etc/otel-collector-config.yml"]
OTEL Collector config:
otel-collector-config.yml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus, logging]

Database Analytics

Spend Logs Table

LiteLLM stores detailed request logs in PostgreSQL:
-- View recent requests
SELECT 
  request_id,
  model,
  "user",
  team_id,
  spend,
  total_tokens,
  "startTime",
  "endTime",
  request_duration_ms
FROM "LiteLLM_SpendLogs"
ORDER BY "startTime" DESC
LIMIT 100;

Analytics Queries

SELECT 
  model,
  COUNT(*) as request_count,
  SUM(spend) as total_spend,
  AVG(spend) as avg_spend_per_request,
  SUM(total_tokens) as total_tokens
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW() - INTERVAL '24 hours'
GROUP BY model
ORDER BY total_spend DESC;

Daily Aggregates

LiteLLM maintains pre-aggregated daily statistics:
-- User daily spend
SELECT 
  date,
  user_id,
  SUM(spend) as daily_spend,
  SUM(api_requests) as requests,
  SUM(successful_requests) as successful,
  SUM(failed_requests) as failed
FROM "LiteLLM_DailyUserSpend"
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date, user_id
ORDER BY date DESC, daily_spend DESC;

-- Team daily spend
SELECT 
  date,
  team_id,
  SUM(spend) as daily_spend
FROM "LiteLLM_DailyTeamSpend"
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date, team_id
ORDER BY date DESC;

Health Monitoring

Health Check Endpoints

# Basic health
curl http://localhost:4000/health

# Liveliness (container is running)
curl http://localhost:4000/health/liveliness

# Readiness (ready to serve traffic)
curl http://localhost:4000/health/readiness
Response format:
{
  "status": "healthy",
  "uptime": 3600,
  "models": {
    "gpt-4o": "healthy",
    "claude-sonnet-4": "healthy"
  },
  "database": "connected",
  "redis": "connected"
}

Model Health Checks

LiteLLM automatically monitors model health:
config.yaml
general_settings:
  health_check: true
  health_check_interval: 300  # Check every 5 minutes

router_settings:
  allowed_fails: 3
  cooldown_time: 30  # Seconds before retry
  retry_after: 10
View health status:
SELECT 
  model_name,
  status,
  healthy_count,
  unhealthy_count,
  response_time_ms,
  checked_at
FROM "LiteLLM_HealthCheckTable"
ORDER BY checked_at DESC;

Alerting Rules

Prometheus alerting rules:
alerts.yml
groups:
  - name: litellm
    interval: 30s
    rules:
      - alert: HighErrorRate
        expr: |
          (sum(rate(litellm_requests_total{status!="success"}[5m])) /
           sum(rate(litellm_requests_total[5m]))) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }}"
      
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95,
            rate(litellm_request_duration_seconds_bucket[5m])
          ) > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High P95 latency detected"
          description: "P95 latency is {{ $value }}s"
      
      - alert: ModelUnhealthy
        expr: litellm_model_health_status{status="unhealthy"} == 1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Model {{ $labels.model }} is unhealthy"
      
      - alert: HighCost
        expr: |
          increase(litellm_spend_total[1h]) > 100
        labels:
          severity: warning
        annotations:
          summary: "High spend detected"
          description: "Spend in last hour: ${{ $value }}"

Admin Dashboard

LiteLLM includes a built-in admin UI at /ui: Features:
  • Real-time request logs
  • Cost analytics and spend tracking
  • Model performance metrics
  • Team and user management
  • API key management
  • Health status overview
Access: http://localhost:4000/ui
Use LITELLM_MASTER_KEY to authenticate to the admin dashboard.

Best Practices

1

Enable Multiple Backends

Don’t rely on a single monitoring solution:
litellm_settings:
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["langfuse", "sentry"]
2

Set Up Alerts

Configure alerts for:
  • High error rates (>5%)
  • High latency (P95 >5s)
  • Model failures
  • Cost spikes
  • Rate limit exhaustion
3

Retain Logs

Keep logs for compliance and debugging:
# Prometheus retention
--storage.tsdb.retention.time=90d

# Database cleanup (archive old logs)
DELETE FROM "LiteLLM_SpendLogs" 
WHERE "startTime" < NOW() - INTERVAL '90 days';
4

Tag Everything

Use metadata for filtering:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "metadata": {
            "user_id": "user-123",
            "session_id": "sess-456",
            "environment": "production",
            "feature": "chat"
        }
    }
)

Next Steps

Performance

Optimize latency and throughput

Security

Secure your deployment

Troubleshooting

Debug common issues

High Availability

Deploy for production at scale

Build docs developers (and LLMs) love