Grafana Setup - LLM Gateway Core

Grafana provides powerful visualization and dashboarding capabilities for monitoring your LLM Gateway deployment. The gateway includes automatic Grafana provisioning with Prometheus as a pre-configured data source.

Quick Start

Start the stack

Launch all services including Grafana:

docker-compose up -d

Access Grafana

Open your browser to:

http://localhost:3000

Anonymous access is enabled by default for development. You can also log in with:

Username: admin
Password: admin

Verify data source

Prometheus should already be configured as the default data source. Verify by navigating to:Configuration → Data Sources → PrometheusYou should see status: “Data source is working”

Create your first dashboard

Click the + icon → Dashboard → Add new panel to start visualizing metrics.

Automatic Provisioning

Grafana is automatically configured through provisioning files mounted from the repository.

Data Source Configuration

The Prometheus data source is provisioned via grafana/provisioning/datasources/datasource.yml:

datasource.yml

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

name

string

Display name for the data source in Grafana

type

string

Data source type (prometheus, graphite, influxdb, etc.)

access

string

proxy means Grafana server queries Prometheus (recommended for Docker)

url

string

Prometheus server URL (uses Docker service name prometheus:9090)

isDefault

boolean

Makes this the default data source for new panels

Dashboard Provisioning

Dashboard auto-loading is configured via grafana/provisioning/dashboards/dashboards.yml:

dashboards.yml

apiVersion: 1
providers:
  - name: 'Standard Dashboards'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /etc/grafana/provisioning/dashboards

Place dashboard JSON files in grafana/provisioning/dashboards/ to auto-load them on Grafana startup.

Docker Configuration

Grafana runs as a Docker service configured in docker-compose.yml:

docker-compose.yml

grafana:
  image: grafana/grafana:latest
  ports:
    - "3000:3000"
  volumes:
    - ./grafana/provisioning:/etc/grafana/provisioning
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=admin
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

Environment Variables

GF_SECURITY_ADMIN_PASSWORD

string

Admin user password (default: admin)

GF_AUTH_ANONYMOUS_ENABLED

boolean

Enables anonymous access without login

GF_AUTH_ANONYMOUS_ORG_ROLE

string

Role for anonymous users (Viewer, Editor, or Admin)

Production Security: Disable anonymous access and change the default admin password:

environment:
  - GF_SECURITY_ADMIN_PASSWORD=your_secure_password
  - GF_AUTH_ANONYMOUS_ENABLED=false

Creating Dashboards

Dashboard Example: Gateway Overview

Create a comprehensive dashboard to monitor gateway health:

Create new dashboard

Click + → Dashboard → Add new panel

Add request rate panel

Panel 1: Request Rate

Query: rate(gateway_requests_total[1m])
Panel type: Graph
Title: “Requests per Second”
Y-axis label: “req/s”

Add latency panel

Panel 2: Request Latency (P95)

Query: histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m]))
Panel type: Graph
Title: “Request Latency (95th percentile)”
Y-axis label: “seconds”

Add active requests panel

Panel 3: Active Requests

Query: gateway_active_requests
Panel type: Stat
Title: “Active Requests”

Add cache hit rate panel

Panel 4: Cache Hit Rate

Query: rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) * 100
Panel type: Gauge
Title: “Cache Hit Rate (%)”
Thresholds: Red < 50%, Yellow 50-80%, Green > 80%

Save dashboard

Click Save → Enter name “LLM Gateway Overview” → Save

Dashboard Example: Provider Performance

Monitor individual provider performance:

Provider Metrics Dashboard

Panel 1: Calls by Provider

sum by (provider) (rate(provider_calls_total[5m]))

Panel type: Bar gauge
Legend: {{provider}}

Panel 2: Provider Latency

histogram_quantile(0.95, sum by (provider, le) (rate(provider_call_latency_seconds_bucket[5m])))

Panel type: Time series
Legend: {{provider}} - P95

Panel 3: Provider Error Rate

rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) * 100

Panel type: Time series
Legend: {{provider}}
Y-axis: Percentage

Panel 4: Provider Success Rate

sum by (provider) (provider_calls_total - provider_failures_total)

Panel type: Stat
Calculation: Total

Essential Panels

Recommended panels for monitoring LLM Gateway:

Performance Metrics

Request Rate

rate(gateway_requests_total[1m])

Shows requests per second over 1-minute windows.

P50/P95/P99 Latency

histogram_quantile(0.50, rate(gateway_request_latency_seconds_bucket[5m]))
histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m]))
histogram_quantile(0.99, rate(gateway_request_latency_seconds_bucket[5m]))

Track latency percentiles to identify performance degradation.

Active Connections

gateway_active_requests

Monitor concurrent request processing.

Cache Performance

Cache Hit Rate

rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) * 100

Percentage of requests served from cache.

Cache Hits vs Misses

rate(cache_hits_total[5m])
rate(cache_misses_total[5m])

Visualize both metrics on the same graph.

Cache Savings

rate(cache_hits_total[5m])

Number of provider calls avoided per second.

Provider Health

Provider Calls by Type

sum by (provider) (rate(provider_calls_total[5m]))

Distribution of traffic across providers.

Provider Error Rate

sum by (provider) (rate(provider_failures_total[5m])) / sum by (provider) (rate(provider_calls_total[5m])) * 100

Error percentage by provider.

Provider Latency Comparison

histogram_quantile(0.95, sum by (provider, le) (rate(provider_call_latency_seconds_bucket[5m])))

Compare P95 latency across providers.

Rate Limiting

Allowed vs Blocked Requests

rate(rate_limit_allowed_total[5m])
rate(rate_limit_blocked_total[5m])

Visualize rate limiter activity.

Block Rate Percentage

rate(rate_limit_blocked_total[5m]) / (rate(rate_limit_allowed_total[5m]) + rate(rate_limit_blocked_total[5m])) * 100

Percentage of requests blocked.

Total Blocked Requests

rate_limit_blocked_total

Cumulative count of blocked requests.

Alert Configuration

Set up alerts to notify you of issues:

Navigate to alerting

Go to Alerting → Alert rules → New alert rule

Configure conditions

Example: High error rate alert

rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) > 0.1

Triggers when error rate exceeds 10%.

Set evaluation interval

Evaluate every: 1m
For: 5m

Alert fires if condition is true for 5 consecutive minutes.

Configure notification channels

Set up notification channels (Slack, email, PagerDuty, etc.) in:Alerting → Contact points → New contact point

Recommended Alerts

High Error Rate

rate(provider_failures_total[5m]) / rate(provider_calls_total[5m]) > 0.05

Alert when >5% of provider calls fail

High Latency

histogram_quantile(0.95, rate(gateway_request_latency_seconds_bucket[5m])) > 2

Alert when P95 latency exceeds 2 seconds

Cache Hit Rate Drop

rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5

Alert when cache hit rate drops below 50%

High Rate Limit Blocks

rate(rate_limit_blocked_total[5m]) > 10

Alert when blocking >10 requests/second

Dashboard Variables

Add variables to make dashboards dynamic:

Open dashboard settings

Click Dashboard settings (gear icon) → Variables → Add variable

Create provider variable

Name: provider
Type: Query
Query: label_values(provider_calls_total, provider)
Multi-value: Enabled
Include All option: Enabled

Use variable in queries

Reference the variable in panel queries:

rate(provider_calls_total{provider=~"$provider"}[5m])

Variables allow filtering dashboards by provider, time range, or any label value without creating multiple dashboards.

Export Dashboard JSON

Open dashboard

Navigate to the dashboard you want to export

Access share menu

Click Share (icon at top) → Export → Save to file

Save to provisioning directory

Save the JSON file to:

grafana/provisioning/dashboards/llm-gateway.json

It will auto-load on next Grafana restart.

Generate shareable links:

Click Share → Link
Configure options:
- Lock time range: Preserves current time selection
- Shorten URL: Creates shorter link
Copy and share the URL

Ensure recipients have access to your Grafana instance. For external sharing, consider using Snapshot feature instead.

Best Practices

Dashboard Organization

Group by concern: Create separate dashboards for Performance, Providers, Cache, etc.
Use folders: Organize dashboards into folders (“LLM Gateway”, “Infrastructure”)
Consistent naming: Use clear, descriptive dashboard names
Add descriptions: Include dashboard purpose and key metrics in description

Panel Configuration

Descriptive titles: Make panel purpose immediately clear
Appropriate visualizations: Time series for trends, gauges for ratios, stats for current values
Set units: Configure Y-axis units (seconds, percentage, requests/sec)
Use legends: Enable legends with {{label}} syntax for multi-series
Color coding: Use consistent colors (green = good, yellow = warning, red = critical)

Query Optimization

Use appropriate intervals: Match rate() interval to your needs (1m for real-time, 5m for trends)
Limit time range: Don’t query more data than necessary
Use recording rules: Pre-compute expensive queries in Prometheus
Minimize resolution: Adjust Min interval in panel settings

Performance

Limit panels per dashboard: 10-15 panels max for fast loading
Use template variables: Filter data without multiple dashboards
Set refresh intervals: Balance freshness vs load (30s-1m for most cases)
Cache query results: Enable query caching in data source settings

Troubleshooting

No data in panels

Possible causes:

Prometheus not scraping: Check Prometheus UI at http://localhost:9090/targets
Wrong time range: Adjust time picker to include data
Incorrect query: Validate PromQL in Prometheus UI first
Gateway not running: Ensure gateway is up and exposing metrics

Debug steps:

# Check gateway metrics endpoint
curl http://localhost:8000/api/v1/metrics

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# Verify data in Prometheus
curl 'http://localhost:9090/api/v1/query?query=gateway_requests_total'

Grafana can't connect to Prometheus

Error: “Bad Gateway” or “Service Unavailable”Fix:

Verify Prometheus is running:
```
docker-compose ps prometheus
```

Check data source URL uses Docker service name:

url: http://prometheus:9090  # NOT localhost:9090

Restart Grafana:
```
docker-compose restart grafana
```

Dashboard not saving

Error: “Dashboard save failed”Cause: Provisioned dashboards are read-only by default.Fix:

Save as new dashboard with different name
Or modify provisioning config to allow edits:
```
disableDeletion: false
editable: true
```

Alerts not firing

Check:

Alert rule is active (not paused)
Notification channel is configured
Evaluation interval allows condition to persist
Query returns expected values in Explore tab

Test:

Use Test button in alert rule editor
Check Alerting → Alert rules for evaluation history

Next Steps

Metrics Reference

Learn about all available metrics and their meanings

Observability Overview

Understand the complete observability architecture

Get Started

Core Concepts

Providers

Observability

Deployment

​Quick Start

​Automatic Provisioning

​Data Source Configuration

​Dashboard Provisioning

​Docker Configuration

​Environment Variables

​Creating Dashboards

​Dashboard Example: Gateway Overview

​Dashboard Example: Provider Performance

​Essential Panels

​Request Rate

​P50/P95/P99 Latency

​Active Connections

​Cache Hit Rate

​Cache Hits vs Misses

​Cache Savings

​Provider Calls by Type

​Provider Error Rate

​Provider Latency Comparison

​Allowed vs Blocked Requests

​Block Rate Percentage

​Total Blocked Requests

​Alert Configuration

​Recommended Alerts

High Error Rate

High Latency

Cache Hit Rate Drop

High Rate Limit Blocks

​Dashboard Variables

​Exporting and Sharing

​Export Dashboard JSON

​Share Dashboard Link

​Best Practices

​Troubleshooting

​Next Steps

Metrics Reference

Observability Overview

Build docs developers (and LLMs) love

Quick Start

Automatic Provisioning

Data Source Configuration

Dashboard Provisioning

Docker Configuration

Environment Variables

Creating Dashboards

Dashboard Example: Gateway Overview

Dashboard Example: Provider Performance

Essential Panels

Request Rate

P50/P95/P99 Latency

Active Connections

Cache Hit Rate

Cache Hits vs Misses

Cache Savings

Provider Calls by Type

Provider Error Rate

Provider Latency Comparison

Allowed vs Blocked Requests

Block Rate Percentage

Total Blocked Requests

Alert Configuration

Recommended Alerts

Dashboard Variables

Exporting and Sharing

Export Dashboard JSON

Share Dashboard Link

Best Practices

Troubleshooting

Next Steps