Overview
LiteLLM provides built-in observability through:
Prometheus metrics - Request rates, latency, errors
Logging integrations - Langfuse, Datadog, OpenTelemetry
Database tracking - Spend logs, usage analytics
Health checks - Service and model health monitoring
Prometheus Metrics
Enable Metrics Endpoint
LiteLLM exposes Prometheus metrics at /metrics:
curl http://localhost:4000/metrics
Available metrics:
# Request metrics
litellm_requests_total{model="gpt-4o",status="success"}
litellm_request_duration_seconds{model="gpt-4o"}
# Token metrics
litellm_tokens_total{model="gpt-4o",type="prompt"}
litellm_tokens_total{model="gpt-4o",type="completion"}
# Cost metrics
litellm_spend_total{model="gpt-4o",team="engineering"}
# Model health
litellm_model_health_status{model="gpt-4o",status="healthy"}
# Rate limiting
litellm_rate_limit_remaining{key="sk-123",limit_type="rpm"}
Prometheus Configuration
Create prometheus.yml:
global :
scrape_interval : 15s
evaluation_interval : 15s
external_labels :
cluster : 'litellm-production'
scrape_configs :
- job_name : 'litellm'
static_configs :
- targets : [ 'litellm:4000' ]
metrics_path : '/metrics'
scrape_interval : 15s
scrape_timeout : 10s
Docker Compose with Prometheus
services :
litellm :
image : ghcr.io/berriai/litellm:main-stable
ports :
- "4000:4000"
environment :
DATABASE_URL : postgresql://llmproxy:password@db:5432/litellm
LITELLM_MASTER_KEY : sk-1234
depends_on :
- db
- prometheus
db :
image : postgres:16
environment :
POSTGRES_DB : litellm
POSTGRES_USER : llmproxy
POSTGRES_PASSWORD : password
volumes :
- postgres_data:/var/lib/postgresql/data
prometheus :
image : prom/prometheus:latest
ports :
- "9090:9090"
volumes :
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command :
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle'
volumes :
postgres_data :
prometheus_data :
Start:
Access Prometheus UI: http://localhost:9090
Grafana Dashboards
Setup Grafana
Add to docker-compose.yml:
grafana :
image : grafana/grafana:latest
ports :
- "3000:3000"
environment :
GF_SECURITY_ADMIN_PASSWORD : admin
GF_USERS_ALLOW_SIGN_UP : false
volumes :
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
depends_on :
- prometheus
Create grafana/datasources/prometheus.yml:
apiVersion : 1
datasources :
- name : Prometheus
type : prometheus
access : proxy
url : http://prometheus:9090
isDefault : true
editable : true
Create Dashboard
Key panels to include:
Request Rate
Latency
Cost Tracking
Tokens
# Requests per second
rate(litellm_requests_total[5m])
# By model
sum(rate(litellm_requests_total[5m])) by (model)
# Success rate
sum(rate(litellm_requests_total{status="success"}[5m])) /
sum(rate(litellm_requests_total[5m])) * 100
# P50 latency
histogram_quantile(0.5,
rate(litellm_request_duration_seconds_bucket[5m])
)
# P95 latency
histogram_quantile(0.95,
rate(litellm_request_duration_seconds_bucket[5m])
)
# P99 latency
histogram_quantile(0.99,
rate(litellm_request_duration_seconds_bucket[5m])
)
# Total spend (last hour)
increase(litellm_spend_total[1h])
# Spend by model
sum(increase(litellm_spend_total[1h])) by (model)
# Spend by team
sum(increase(litellm_spend_total[1h])) by (team)
# Cost per 1K tokens
increase(litellm_spend_total[1h]) /
(increase(litellm_tokens_total[1h]) / 1000)
# Total tokens per second
rate(litellm_tokens_total[5m])
# Prompt vs completion tokens
sum(rate(litellm_tokens_total{type="prompt"}[5m]))
sum(rate(litellm_tokens_total{type="completion"}[5m]))
# Tokens by model
sum(rate(litellm_tokens_total[5m])) by (model)
Import Pre-built Dashboard
LiteLLM provides a Grafana dashboard JSON:
Download from LiteLLM repository
In Grafana: Dashboards → Import → Upload JSON
Select Prometheus data source
Logging Integrations
Langfuse
Langfuse provides detailed LLM observability with traces, costs, and user analytics.
Setup:
model_list :
- model_name : gpt-4o
litellm_params :
model : gpt-4o
api_key : os.environ/OPENAI_API_KEY
litellm_settings :
success_callback : [ "langfuse" ]
failure_callback : [ "langfuse" ]
general_settings :
master_key : os.environ/LITELLM_MASTER_KEY
Environment variables:
LANGFUSE_PUBLIC_KEY = pk-lf-...
LANGFUSE_SECRET_KEY = sk-lf-...
LANGFUSE_HOST = https://cloud.langfuse.com
Features:
Request/response traces
Token usage and cost tracking
User session analytics
Model performance comparison
Custom metadata tags
Datadog
Enable Datadog tracing:
# Environment variables
USE_DDTRACE = true
DD_API_KEY = your-datadog-api-key
DD_SITE = datadoghq.com
DD_SERVICE = litellm-proxy
DD_ENV = production
DD_VERSION = 1.0.0
DD_TRACE_OPENAI_ENABLED = false
Docker with Datadog Agent:
services :
litellm :
image : ghcr.io/berriai/litellm:main-stable
environment :
USE_DDTRACE : "true"
DD_AGENT_HOST : datadog-agent
DD_TRACE_AGENT_PORT : 8126
depends_on :
- datadog-agent
datadog-agent :
image : datadog/agent:latest
environment :
DD_API_KEY : ${DD_API_KEY}
DD_SITE : datadoghq.com
DD_APM_ENABLED : "true"
DD_APM_NON_LOCAL_TRAFFIC : "true"
ports :
- "8126:8126"
volumes :
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
OpenTelemetry
Configure OTEL export:
general_settings :
otel : true
otel_exporter : otlp_http
otel_endpoint : http://otel-collector:4318
otel_headers :
Authorization : Bearer your-token
litellm_settings :
success_callback : [ "otel" ]
Docker with OTEL Collector:
services :
litellm :
image : ghcr.io/berriai/litellm:main-stable
environment :
OTEL_EXPORTER_OTLP_ENDPOINT : http://otel-collector:4318
OTEL_SERVICE_NAME : litellm-proxy
otel-collector :
image : otel/opentelemetry-collector:latest
ports :
- "4318:4318"
- "4317:4317"
volumes :
- ./otel-collector-config.yml:/etc/otel-collector-config.yml
command : [ "--config=/etc/otel-collector-config.yml" ]
OTEL Collector config:
otel-collector-config.yml
receivers :
otlp :
protocols :
grpc :
endpoint : 0.0.0.0:4317
http :
endpoint : 0.0.0.0:4318
processors :
batch :
timeout : 10s
exporters :
prometheus :
endpoint : "0.0.0.0:8889"
logging :
loglevel : debug
service :
pipelines :
traces :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ logging ]
metrics :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ prometheus , logging ]
Database Analytics
Spend Logs Table
LiteLLM stores detailed request logs in PostgreSQL:
-- View recent requests
SELECT
request_id,
model,
"user" ,
team_id,
spend,
total_tokens,
"startTime" ,
"endTime" ,
request_duration_ms
FROM "LiteLLM_SpendLogs"
ORDER BY "startTime" DESC
LIMIT 100 ;
Analytics Queries
Cost by Model
Team Usage
Hourly Stats
Error Analysis
SELECT
model,
COUNT ( * ) as request_count,
SUM (spend) as total_spend,
AVG (spend) as avg_spend_per_request,
SUM (total_tokens) as total_tokens
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW () - INTERVAL '24 hours'
GROUP BY model
ORDER BY total_spend DESC ;
SELECT
team_id,
COUNT ( * ) as requests,
SUM (spend) as total_spend,
SUM (total_tokens) as tokens_used,
AVG (request_duration_ms) as avg_latency_ms
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW () - INTERVAL '7 days'
GROUP BY team_id
ORDER BY total_spend DESC ;
SELECT
DATE_TRUNC( 'hour' , "startTime" ) as hour ,
COUNT ( * ) as requests,
SUM (spend) as spend,
AVG (request_duration_ms) as avg_latency,
SUM ( CASE WHEN status = 'success' THEN 1 ELSE 0 END ):: float /
COUNT ( * ) * 100 as success_rate
FROM "LiteLLM_SpendLogs"
WHERE "startTime" >= NOW () - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour ;
SELECT
model,
COUNT ( * ) as error_count,
SUM (spend) as wasted_spend
FROM "LiteLLM_SpendLogs"
WHERE status != 'success'
AND "startTime" >= NOW () - INTERVAL '24 hours'
GROUP BY model
ORDER BY error_count DESC ;
Daily Aggregates
LiteLLM maintains pre-aggregated daily statistics:
-- User daily spend
SELECT
date ,
user_id,
SUM (spend) as daily_spend,
SUM (api_requests) as requests,
SUM (successful_requests) as successful,
SUM (failed_requests) as failed
FROM "LiteLLM_DailyUserSpend"
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date , user_id
ORDER BY date DESC , daily_spend DESC ;
-- Team daily spend
SELECT
date ,
team_id,
SUM (spend) as daily_spend
FROM "LiteLLM_DailyTeamSpend"
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date , team_id
ORDER BY date DESC ;
Health Monitoring
Health Check Endpoints
# Basic health
curl http://localhost:4000/health
# Liveliness (container is running)
curl http://localhost:4000/health/liveliness
# Readiness (ready to serve traffic)
curl http://localhost:4000/health/readiness
Response format:
{
"status" : "healthy" ,
"uptime" : 3600 ,
"models" : {
"gpt-4o" : "healthy" ,
"claude-sonnet-4" : "healthy"
},
"database" : "connected" ,
"redis" : "connected"
}
Model Health Checks
LiteLLM automatically monitors model health:
general_settings :
health_check : true
health_check_interval : 300 # Check every 5 minutes
router_settings :
allowed_fails : 3
cooldown_time : 30 # Seconds before retry
retry_after : 10
View health status:
SELECT
model_name,
status ,
healthy_count,
unhealthy_count,
response_time_ms,
checked_at
FROM "LiteLLM_HealthCheckTable"
ORDER BY checked_at DESC ;
Alerting Rules
Prometheus alerting rules:
groups :
- name : litellm
interval : 30s
rules :
- alert : HighErrorRate
expr : |
(sum(rate(litellm_requests_total{status!="success"}[5m])) /
sum(rate(litellm_requests_total[5m]))) > 0.05
for : 5m
labels :
severity : warning
annotations :
summary : "High error rate detected"
description : "Error rate is {{ $value | humanizePercentage }}"
- alert : HighLatency
expr : |
histogram_quantile(0.95,
rate(litellm_request_duration_seconds_bucket[5m])
) > 5
for : 5m
labels :
severity : warning
annotations :
summary : "High P95 latency detected"
description : "P95 latency is {{ $value }}s"
- alert : ModelUnhealthy
expr : litellm_model_health_status{status="unhealthy"} == 1
for : 2m
labels :
severity : critical
annotations :
summary : "Model {{ $labels.model }} is unhealthy"
- alert : HighCost
expr : |
increase(litellm_spend_total[1h]) > 100
labels :
severity : warning
annotations :
summary : "High spend detected"
description : "Spend in last hour: ${{ $value }}"
Admin Dashboard
LiteLLM includes a built-in admin UI at /ui:
Features:
Real-time request logs
Cost analytics and spend tracking
Model performance metrics
Team and user management
API key management
Health status overview
Access: http://localhost:4000/ui
Use LITELLM_MASTER_KEY to authenticate to the admin dashboard.
Best Practices
Enable Multiple Backends
Don’t rely on a single monitoring solution: litellm_settings :
success_callback : [ "langfuse" , "prometheus" ]
failure_callback : [ "langfuse" , "sentry" ]
Set Up Alerts
Configure alerts for:
High error rates (>5%)
High latency (P95 >5s)
Model failures
Cost spikes
Rate limit exhaustion
Retain Logs
Keep logs for compliance and debugging: # Prometheus retention
--storage.tsdb.retention.time=90d
# Database cleanup (archive old logs)
DELETE FROM "LiteLLM_SpendLogs"
WHERE "startTime" < NOW() - INTERVAL '90 days';
Tag Everything
Use metadata for filtering: response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [ ... ],
extra_body = {
"metadata" : {
"user_id" : "user-123" ,
"session_id" : "sess-456" ,
"environment" : "production" ,
"feature" : "chat"
}
}
)
Next Steps
Performance Optimize latency and throughput
Security Secure your deployment
Troubleshooting Debug common issues
High Availability Deploy for production at scale