Prerequisites
- Docker and Docker Compose installed
- Access to vLLM
/metricsendpoint - Authentication token for remote vLLM endpoints (if applicable)
Environment Configuration
The
.env file is gitignored and never committed. Keep your credentials secure.Starting the Stack
Quick Start
The simplest way to launch the entire monitoring stack:- Stops any running containers
- Generates configuration files from templates
- Builds Docker images
- Starts containers in the background
- Streams real-time logs
Individual Commands
For more granular control, use these Make targets:| Command | Description |
|---|---|
make prometheus-conf | Generates prometheus.yml from template |
make grafana-conf | Generates dashboard JSON files from templates |
make docker-build | Builds the Docker images |
make docker-run | Starts the containers in the background |
make docker-stop | Stops containers and deletes generated config files |
make docker-logs | Streams real-time logs |
Step-by-Step Deployment
Generate configurations
prometheus/prometheus.ymlgrafana/provisioning/dashboards/user_metrics_overview.jsongrafana/provisioning/dashboards/machine_metrics_overview.jsongrafana/provisioning/dashboards/vllm_tokens.json
Accessing Grafana
Once the stack is running:- Open your browser to
http://localhost:4000(or your configuredGRAFANA_PORT) - Log in with the credentials from your
.envfile:- Username:
ADMIN_USER - Password:
ADMIN_PASSWORD
- Username:
- The default home dashboard is User Metrics Overview
First-time setup: Grafana automatically provisions the Prometheus data source and all dashboards on startup.
Stopping the Stack
To stop and clean up all resources:- Stops and removes containers
- Deletes generated configuration files
- Preserves persistent Prometheus data (if configured)
Troubleshooting
Prometheus can’t reach vLLM
Check your.env configuration:
- Verify
VLLM_TARGETis correct - Ensure
SCHEMEmatches your vLLM deployment (http/https) - For remote endpoints, confirm
VLLM_METRICS_AUTH_TOKENis valid
Grafana shows “No Data”
-
Check Prometheus is scraping successfully:
-
Verify the Prometheus data source in Grafana:
- Settings > Data Sources > Prometheus
- Click “Test” to verify connectivity
Port conflicts
If port 4000 is already in use, changeGRAFANA_PORT in .env and restart:
Advanced Configuration
Custom retention period
ModifySTORAGE_TSDB_RETENTION_TIME in .env:
