Skip to main content
Grafana provides powerful visualization and dashboarding capabilities for monitoring your LLM Gateway deployment. The gateway includes automatic Grafana provisioning with Prometheus as a pre-configured data source.

Quick Start

1

Start the stack

Launch all services including Grafana:
docker-compose up -d
2

Access Grafana

Open your browser to:
http://localhost:3000
Anonymous access is enabled by default for development. You can also log in with:
  • Username: admin
  • Password: admin
3

Verify data source

Prometheus should already be configured as the default data source. Verify by navigating to:ConfigurationData SourcesPrometheusYou should see status: “Data source is working”
4

Create your first dashboard

Click the + icon → DashboardAdd new panel to start visualizing metrics.

Automatic Provisioning

Grafana is automatically configured through provisioning files mounted from the repository.

Data Source Configuration

The Prometheus data source is provisioned via grafana/provisioning/datasources/datasource.yml:
datasource.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
name
string
Display name for the data source in Grafana
type
string
Data source type (prometheus, graphite, influxdb, etc.)
access
string
proxy means Grafana server queries Prometheus (recommended for Docker)
url
string
Prometheus server URL (uses Docker service name prometheus:9090)
isDefault
boolean
Makes this the default data source for new panels

Dashboard Provisioning

Dashboard auto-loading is configured via grafana/provisioning/dashboards/dashboards.yml:
dashboards.yml
apiVersion: 1
providers:
  - name: 'Standard Dashboards'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /etc/grafana/provisioning/dashboards
Place dashboard JSON files in grafana/provisioning/dashboards/ to auto-load them on Grafana startup.

Docker Configuration

Grafana runs as a Docker service configured in docker-compose.yml:
docker-compose.yml
grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  volumes:
    - ./grafana/provisioning:/etc/grafana/provisioning
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

Environment Variables

GF_SECURITY_ADMIN_PASSWORD
string
Admin user password (default: admin)
GF_AUTH_ANONYMOUS_ENABLED
boolean
Enables anonymous access without login
GF_AUTH_ANONYMOUS_ORG_ROLE
string
Role for anonymous users (Viewer, Editor, or Admin)
Production Security: Disable anonymous access and change the default admin password:
environment:
  - GF_SECURITY_ADMIN_PASSWORD=your_secure_password
  - GF_AUTH_ANONYMOUS_ENABLED=false

Creating Dashboards

Dashboard Example: Gateway Overview

Create a comprehensive dashboard to monitor gateway health:
1

Create new dashboard

Click +DashboardAdd new panel
2

Add request rate panel

Panel 1: Request Rate
  • Query: rate(gateway_requests_total[1m])
  • Panel type: Graph
  • Title: “Requests per Second”
  • Y-axis label: “req/s”
3

Add latency panel

Panel 2: Request Latency (P95)
  • Query: histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m]))
  • Panel type: Graph
  • Title: “Request Latency (95th percentile)”
  • Y-axis label: “seconds”
4

Add active requests panel

Panel 3: Active Requests
  • Query: gateway_active_requests
  • Panel type: Stat
  • Title: “Active Requests”
5

Add cache hit rate panel

Panel 4: Cache Hit Rate
  • Query: rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) * 100
  • Panel type: Gauge
  • Title: “Cache Hit Rate (%)”
  • Thresholds: Red < 50%, Yellow 50-80%, Green > 80%
6

Save dashboard

Click Save → Enter name “LLM Gateway Overview” → Save

Dashboard Example: Provider Performance

Monitor individual provider performance:
Panel 1: Calls by Provider
sum by (provider) (rate(provider_calls_total[5m]))
  • Panel type: Bar gauge
  • Legend: {{provider}}
Panel 2: Provider Latency
histogram_quantile(0.95, sum by (provider, le) (rate(provider_call_latency_seconds_bucket[5m])))
  • Panel type: Time series
  • Legend: {{provider}} - P95
Panel 3: Provider Error Rate
rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) * 100
  • Panel type: Time series
  • Legend: {{provider}}
  • Y-axis: Percentage
Panel 4: Provider Success Rate
sum by (provider) (provider_calls_total - provider_failures_total)
  • Panel type: Stat
  • Calculation: Total

Essential Panels

Recommended panels for monitoring LLM Gateway:

Request Rate

rate(gateway_requests_total[1m])
Shows requests per second over 1-minute windows.

P50/P95/P99 Latency

histogram_quantile(0.50, rate(gateway_request_latency_seconds_bucket[5m]))
histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m]))
histogram_quantile(0.99, rate(gateway_request_latency_seconds_bucket[5m]))
Track latency percentiles to identify performance degradation.

Active Connections

gateway_active_requests
Monitor concurrent request processing.

Cache Hit Rate

rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) * 100
Percentage of requests served from cache.

Cache Hits vs Misses

rate(cache_hits_total[5m])
rate(cache_misses_total[5m])
Visualize both metrics on the same graph.

Cache Savings

rate(cache_hits_total[5m])
Number of provider calls avoided per second.

Provider Calls by Type

sum by (provider) (rate(provider_calls_total[5m]))
Distribution of traffic across providers.

Provider Error Rate

sum by (provider) (rate(provider_failures_total[5m])) / sum by (provider) (rate(provider_calls_total[5m])) * 100
Error percentage by provider.

Provider Latency Comparison

histogram_quantile(0.95, sum by (provider, le) (rate(provider_call_latency_seconds_bucket[5m])))
Compare P95 latency across providers.

Allowed vs Blocked Requests

rate(rate_limit_allowed_total[5m])
rate(rate_limit_blocked_total[5m])
Visualize rate limiter activity.

Block Rate Percentage

rate(rate_limit_blocked_total[5m]) / (rate(rate_limit_allowed_total[5m]) + rate(rate_limit_blocked_total[5m])) * 100
Percentage of requests blocked.

Total Blocked Requests

rate_limit_blocked_total
Cumulative count of blocked requests.

Alert Configuration

Set up alerts to notify you of issues:
1

Navigate to alerting

Go to AlertingAlert rulesNew alert rule
2

Configure conditions

Example: High error rate alert
rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) > 0.1
Triggers when error rate exceeds 10%.
3

Set evaluation interval

  • Evaluate every: 1m
  • For: 5m
Alert fires if condition is true for 5 consecutive minutes.
4

Configure notification channels

Set up notification channels (Slack, email, PagerDuty, etc.) in:AlertingContact pointsNew contact point

High Error Rate

rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) > 0.05
Alert when >5% of provider calls fail

High Latency

histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m])) > 2
Alert when P95 latency exceeds 2 seconds

Cache Hit Rate Drop

rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5
Alert when cache hit rate drops below 50%

High Rate Limit Blocks

rate(rate_limit_blocked_total[5m]) > 10
Alert when blocking >10 requests/second

Dashboard Variables

Add variables to make dashboards dynamic:
1

Open dashboard settings

Click Dashboard settings (gear icon) → VariablesAdd variable
2

Create provider variable

  • Name: provider
  • Type: Query
  • Query: label_values(provider_calls_total, provider)
  • Multi-value: Enabled
  • Include All option: Enabled
3

Use variable in queries

Reference the variable in panel queries:
rate(provider_calls_total{provider=~"$provider"}[5m])
Variables allow filtering dashboards by provider, time range, or any label value without creating multiple dashboards.

Exporting and Sharing

Export Dashboard JSON

1

Open dashboard

Navigate to the dashboard you want to export
2

Access share menu

Click Share (icon at top) → ExportSave to file
3

Save to provisioning directory

Save the JSON file to:
grafana/provisioning/dashboards/llm-gateway.json
It will auto-load on next Grafana restart.
Generate shareable links:
  1. Click ShareLink
  2. Configure options:
    • Lock time range: Preserves current time selection
    • Shorten URL: Creates shorter link
  3. Copy and share the URL
Ensure recipients have access to your Grafana instance. For external sharing, consider using Snapshot feature instead.

Best Practices

  • Group by concern: Create separate dashboards for Performance, Providers, Cache, etc.
  • Use folders: Organize dashboards into folders (“LLM Gateway”, “Infrastructure”)
  • Consistent naming: Use clear, descriptive dashboard names
  • Add descriptions: Include dashboard purpose and key metrics in description
  • Descriptive titles: Make panel purpose immediately clear
  • Appropriate visualizations: Time series for trends, gauges for ratios, stats for current values
  • Set units: Configure Y-axis units (seconds, percentage, requests/sec)
  • Use legends: Enable legends with {{label}} syntax for multi-series
  • Color coding: Use consistent colors (green = good, yellow = warning, red = critical)
  • Use appropriate intervals: Match rate() interval to your needs (1m for real-time, 5m for trends)
  • Limit time range: Don’t query more data than necessary
  • Use recording rules: Pre-compute expensive queries in Prometheus
  • Minimize resolution: Adjust Min interval in panel settings
  • Limit panels per dashboard: 10-15 panels max for fast loading
  • Use template variables: Filter data without multiple dashboards
  • Set refresh intervals: Balance freshness vs load (30s-1m for most cases)
  • Cache query results: Enable query caching in data source settings

Troubleshooting

Possible causes:
  1. Prometheus not scraping: Check Prometheus UI at http://localhost:9090/targets
  2. Wrong time range: Adjust time picker to include data
  3. Incorrect query: Validate PromQL in Prometheus UI first
  4. Gateway not running: Ensure gateway is up and exposing metrics
Debug steps:
# Check gateway metrics endpoint
curl http://localhost:8000/api/v1/metrics

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# Verify data in Prometheus
curl 'http://localhost:9090/api/v1/query?query=gateway_requests_total'
Error: “Bad Gateway” or “Service Unavailable”Fix:
  1. Verify Prometheus is running:
    docker-compose ps prometheus
    
  2. Check data source URL uses Docker service name:
    url: http://prometheus:9090  # NOT localhost:9090
    
  3. Restart Grafana:
    docker-compose restart grafana
    
Error: “Dashboard save failed”Cause: Provisioned dashboards are read-only by default.Fix:
  1. Save as new dashboard with different name
  2. Or modify provisioning config to allow edits:
    disableDeletion: false
    editable: true
    
Check:
  1. Alert rule is active (not paused)
  2. Notification channel is configured
  3. Evaluation interval allows condition to persist
  4. Query returns expected values in Explore tab
Test:
  • Use Test button in alert rule editor
  • Check AlertingAlert rules for evaluation history

Next Steps

Metrics Reference

Learn about all available metrics and their meanings

Observability Overview

Understand the complete observability architecture

Build docs developers (and LLMs) love