Overview
SGLang provides comprehensive monitoring capabilities through Prometheus metrics and Grafana dashboards. This allows you to track performance, resource usage, and request patterns in real-time.Quick Start
Enable Metrics
To enable metrics collection, start your SGLang server with the--enable-metrics flag:
http://localhost:30000/metrics.
Verify Metrics
You can verify that metrics are being collected by querying the metrics endpoint:Docker-based Monitoring Stack
SGLang includes a pre-configured monitoring stack with Prometheus and Grafana in theexamples/monitoring directory.
Prerequisites
- Docker and Docker Compose installed
- SGLang server running with metrics enabled
Setup Steps
-
Start your SGLang server with metrics enabled:
-
Navigate to the monitoring directory:
-
Start the monitoring stack:
-
Access the interfaces:
- Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
-
Log in to Grafana:
- Default Username:
admin - Default Password:
admin - You will be prompted to change the password on first login
- Default Username:
-
View the Dashboard:
Navigate to
Dashboards→Browse→SGLang Monitoring→SGLang Dashboard
Configuration Files
The monitoring setup is defined by these files inexamples/monitoring:
docker-compose.yaml: Defines Prometheus and Grafana servicesprometheus.yaml: Prometheus configuration, including scrape targetsgrafana/datasources/datasource.yaml: Configures Prometheus as a data sourcegrafana/dashboards/config/dashboard.yaml: Tells Grafana where to load dashboardsgrafana/dashboards/json/sglang-dashboard.json: The Grafana dashboard definition
Customizing Prometheus Scrape Configuration
If your SGLang server runs on a different host or port, update theprometheus.yaml file:
host.docker.internal (Docker Desktop) or your machine’s network IP instead of localhost.
Troubleshooting
Port Conflicts
If ports 3000 or 9090 are already in use: Option 1: Change Grafana port with environment variable:No Data on Dashboard
-
Generate traffic to produce metrics:
-
Verify Prometheus is scraping the SGLang endpoint:
- Go to Prometheus UI: http://localhost:9090
- Check
Status→Targets - Ensure the SGLang endpoint shows as “UP”
-
Check label matching:
- Verify
model_nameandinstancelabels in Prometheus match dashboard variables - You may need to adjust Grafana dashboard variables
- Verify
Connection Issues
-
Verify containers are running:
-
Check Prometheus data source in Grafana:
- Go to
Connections→Data sources→Prometheus - URL should be
http://prometheus:9090
- Go to
-
Test metrics endpoint accessibility:
From inside the Prometheus container:
Advanced Configuration
Extra Metric Labels
Add custom labels to all metrics using the--extra-metric-labels flag:
Multiprocess Metrics
For multi-GPU or distributed setups, SGLang automatically handles multiprocess metrics collection. Each process exports metrics with appropriate labels:tp_rank: Tensor parallel rankpp_rank: Pipeline parallel rankdp_rank: Data parallel rank (if applicable)moe_ep_rank: MoE expert parallel rank
CPU Monitoring
SGLang includes CPU usage monitoring via thesglang:process_cpu_seconds_total metric, which tracks total CPU time (user + system) consumed by each process component.
Grafana Dashboard
The pre-configured dashboard provides visualization for:- Request Metrics: Throughput, latency distributions (TTFT, TPOT, E2E)
- Token Metrics: Prompt tokens, generation tokens, cache hit rates
- Resource Utilization: Token usage, queue sizes, running requests
- Performance: Generation throughput, inter-token latency
- Speculative Decoding: Acceptance rates and lengths (if enabled)
- PD Disaggregation: KV transfer speeds, queue depths (if using prefill-decode separation)
Next Steps
- Learn about available Prometheus metrics
- Set up request tracing with OpenTelemetry
- Run benchmarks to test performance
