Monitoring Overview

Architecture

Umbra’s monitoring stack provides real-time visibility into vLLM model performance using a containerized Prometheus and Grafana deployment.

Components

Prometheus

Scrapes vLLM /metrics endpoint and stores time-series data

Grafana

Visualizes metrics through pre-configured dashboards

vLLM

Exposes detailed runtime metrics at /metrics endpoint

Docker Network

Internal network for service communication

Data Flow

vLLM exposes a /metrics endpoint with detailed runtime metrics about the model
Prometheus continuously scrapes this endpoint and stores the data as structured time series
Grafana displays these metrics using pre-configured dashboards

Available Dashboards

The monitoring stack includes three pre-configured dashboards:

User Metrics Overview

Tracks user-facing performance metrics:

TTFT (Time to First Token)
End-to-end latency
Queue waiting time
Number of running requests

Machine Metrics Overview

Monitors hardware resource utilization:

GPU usage and memory
CPU workload
Running and waiting requests
System resource consumption

vLLM Tokens Dashboard

Provides token-level metrics for throughput analysis.

Configuration Strategy

The monitoring stack uses a secure two-step configuration process:

Environment Variables

Sensitive credentials and endpoints are stored in .env file (never committed to Git)

Template Processing

Configuration files are generated from .template files with whitelisted variable substitution

Lifecycle Management

Generated configs are created on build and automatically deleted on stop

Why Templates?This approach protects internal Grafana and Prometheus variables (like $job or $datasource) from being accidentally replaced while allowing safe injection of secrets.

Whitelisted Variables

Prometheus (prometheus.yml.template):

${SCHEME} - HTTP or HTTPS protocol
${VLLM_TARGET} - vLLM endpoint address
${VLLM_METRICS_AUTH_TOKEN} - Bearer token for metrics access

Grafana (dashboard JSON templates):

${VLLM_SCRAPE_JOB_NAME} - Prometheus job name
${GRAFANA_DATASOURCE_UID} - Data source identifier

Access

Once running, the monitoring interfaces are available at:

Grafana: http://localhost:4000 (or custom GRAFANA_PORT)
Prometheus: Internal Docker network only (not exposed publicly)

Prometheus is intentionally not exposed publicly. All queries should be performed through Grafana dashboards.

Get Started

Core Features

Security

Frontend

CVM Services

Monitoring

Monitoring Overview

Architecture

Components

Prometheus

Grafana

vLLM

Docker Network

Data Flow

Available Dashboards

User Metrics Overview

Machine Metrics Overview

vLLM Tokens Dashboard

Configuration Strategy

Whitelisted Variables

Access

Build docs developers (and LLMs) love

Get Started

Core Features

Security

Frontend

CVM Services

Monitoring

​Architecture

​Components

Prometheus

Grafana

vLLM

Docker Network

​Data Flow

​Available Dashboards

​User Metrics Overview

​Machine Metrics Overview

​vLLM Tokens Dashboard

​Configuration Strategy

​Whitelisted Variables

​Access

Build docs developers (and LLMs) love

Architecture

Components

Data Flow

Available Dashboards

User Metrics Overview

Machine Metrics Overview

vLLM Tokens Dashboard

Configuration Strategy

Whitelisted Variables

Access