Skip to main content
NetBird components expose Prometheus metrics and health endpoints for comprehensive monitoring of your mesh network infrastructure.

Metrics Endpoint

All NetBird server components expose metrics in Prometheus format at the /metrics endpoint.

Management Server Metrics

The management server exposes metrics on a configurable port (default: 9090):
curl http://localhost:9090/metrics

Enable Metrics

Metrics are enabled by default. Configure the metrics port in your management server:
netbird-mgmt --metrics-port 9090
Metrics use OpenTelemetry with Prometheus exporters for standardized observability.

Key Metrics to Monitor

gRPC API Metrics

Monitor peer connections and authentication:
MetricTypeDescription
management.grpc.sync.request.counterCounterNumber of sync requests from peers
management.grpc.login.request.counterCounterNumber of login requests
management.grpc.login.request.blocked.counterCounterBlocked login attempts
management.grpc.connected.streamsGaugeActive peer connections
management.grpc.login.request.duration.msHistogramLogin request latency
management.grpc.sync.request.duration.msHistogramSync request latency
management.grpc.updatechannel.queueHistogramUpdate channel queue depth
Monitor management.grpc.login.request.high.latency.counter - values above 7s indicate performance issues.

HTTP API Metrics

Track REST API performance:
MetricTypeDescription
management.http.request.counterCounterHTTP requests by endpoint and method
management.http.response.counterCounterHTTP responses by status code
management.http.request.duration.msHistogramRequest duration by endpoint
management.http.request.counter.totalCounterTotal HTTP requests
management.http.response.code.totalCounterTotal responses by status code

Store Metrics

Monitor database and persistence performance:
MetricTypeDescription
management.store.global.lock.acquisition.duration.microHistogramTime to acquire store lock (µs)
management.store.global.lock.acquisition.duration.msHistogramTime holding store lock (ms)
management.store.persistence.duration.microHistogramSave/delete operation duration (µs)
management.store.transaction.duration.msHistogramTransaction duration (ms)

Account Manager Metrics

Track network map calculation performance:
MetricTypeDescription
management.account.update.account.peers.duration.msHistogramPeer update preparation time
management.account.get.peer.network.map.duration.msHistogramNetwork map calculation time
management.account.network.map.object.countHistogramObjects in network map
management.account.peer.meta.update.counterCounterPeer metadata updates

IDP Metrics

Monitor identity provider integration:
MetricTypeDescription
management.idp.authenticate.request.counterCounterIDP authentication requests
management.idp.get.user.by.email.counterCounterUser lookup requests
management.idp.request.error.counterCounterIDP request errors
management.idp.request.status.error.counterCounterNon-2xx responses from IDP

Health Checks

Relay Server Health

Relay servers expose a health check endpoint at /health:
curl http://localhost:8080/health
Response (healthy):
{
  "status": "healthy",
  "timestamp": "2026-03-04T10:30:00Z",
  "listeners": ["ws", "wss"],
  "certificate_valid": true
}
Response (unhealthy):
{
  "status": "unhealthy",
  "timestamp": "2026-03-04T10:30:00Z",
  "listeners": [],
  "certificate_valid": false
}
Health check responses are cached for 3 seconds to reduce overhead.

Management Server Health

Check management server availability:
# gRPC API check
grpcurl -plaintext localhost:33073 list

# HTTP API check
curl -k https://localhost:443/api/peers

Prometheus Configuration

Add NetBird components to your prometheus.yml:
scrape_configs:
  - job_name: 'netbird-management'
    static_configs:
      - targets: ['management:9090']
    scrape_interval: 30s

  - job_name: 'netbird-signal'
    static_configs:
      - targets: ['signal:9090']
    scrape_interval: 30s

  - job_name: 'netbird-relay'
    static_configs:
      - targets: ['relay:9090']
    scrape_interval: 30s

Grafana Dashboards

NetBird provides pre-built Grafana dashboards in the source repository:
infrastructure_files/observability/grafana/dashboards/
├── management.json
├── signal.json
└── relay.json
Import these dashboards to visualize:
  • Active peer connections
  • Login and sync latencies
  • Network map distribution times
  • Store operation performance
  • Error rates and blocked requests

Docker Compose Monitoring Stack

Example monitoring setup:
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    ports:
      - 9090:9090

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana-data:/var/lib/grafana
      - ./dashboards:/etc/grafana/provisioning/dashboards
    ports:
      - 3000:3000
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

Alerting Rules

groups:
  - name: netbird
    interval: 30s
    rules:
      - alert: HighLoginLatency
        expr: management_grpc_login_request_high_latency_counter > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High login latency detected"
          description: "{{ $value }} login requests exceeded 7s threshold"
      - alert: NoActiveStreams
        expr: management_grpc_connected_streams == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "No active peer connections"
          description: "Management server has no connected peers"
      - alert: StoreLockContention
        expr: histogram_quantile(0.95, management_store_global_lock_acquisition_duration_ms) > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Store lock contention detected"
          description: "95th percentile lock acquisition time exceeds 1s"

What to Monitor

1

Peer Connectivity

Monitor management.grpc.connected.streams to track active peers. Sudden drops indicate connectivity issues.
2

API Performance

Watch login and sync duration histograms. P95 latency should be under 1 second for healthy operations.
3

Store Performance

Track lock acquisition and persistence metrics. High values indicate database bottlenecks.
4

Error Rates

Monitor blocked requests and IDP errors. Spikes indicate authentication or authorization problems.
5

Network Map Size

Track management.account.network.map.object.count. Large values (>5000 objects) may impact performance.

Client Monitoring

Clients don’t expose metrics endpoints, but you can monitor client status:
# Check client status
netbird status

# Get detailed peer information
netbird status --detail

# Get JSON output for automation
netbird status --json
Use netbird status --json to integrate client health checks into your monitoring system.

Build docs developers (and LLMs) love