Skip to main content

Overview

Effective monitoring ensures your Headscale deployment remains healthy and performant. This guide covers health checks, metrics collection, log analysis, and alerting strategies.

Health Checks

Headscale Health Endpoint

Headscale exposes a health check endpoint for monitoring service status:
curl http://localhost:8000/health
{
  "status": "pass"
}

Container Health Status

All services include Docker health checks:
# Check all container health
docker compose ps

# Detailed health status
docker inspect --format='{{.State.Health.Status}}' headscale
docker inspect --format='{{.State.Health.Status}}' headscale-db
docker inspect --format='{{.State.Health.Status}}' nginx
Health checks run automatically:
  • Headscale: Every 30s (command: headscale health)
  • PostgreSQL: Every 10s (command: pg_isready)
  • nginx: Every 30s (HTTP check to /health)

Health Check Configuration

From docker-compose.yml:
headscale:
  healthcheck:
    test: [CMD, headscale, health]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 10s

postgres:
  healthcheck:
    test: [CMD-SHELL, "pg_isready -U headscale"]
    interval: 10s
    timeout: 5s
    retries: 5

nginx:
  healthcheck:
    test: [CMD, wget, --quiet, --tries=1, --spider, http://localhost:8080/health]
    interval: 30s
    timeout: 5s
    retries: 3
    start_period: 10s

Prometheus Metrics

Headscale exposes Prometheus-compatible metrics for detailed monitoring.

Metrics Endpoint

Access metrics on port 9090 (localhost only for security):
# View all metrics
curl http://localhost:9090/metrics

# Filter specific metrics
curl http://localhost:9090/metrics | grep headscale_

Key Metrics

# Total registered nodes
headscale_nodes_total

# Nodes by state
headscale_nodes_registered
headscale_nodes_online
headscale_nodes_offline

# Node registration rate
rate(headscale_node_registrations_total[5m])
# Active connections
headscale_derp_connections_active

# Data transfer
headscale_network_bytes_sent_total
headscale_network_bytes_received_total

# Connection quality
headscale_connection_latency_seconds
# Request rate
rate(headscale_http_requests_total[1m])

# Request duration
headscale_http_request_duration_seconds

# Error rate
rate(headscale_http_requests_total{code=~"5.."}[5m])
# Database connections
headscale_db_connections_open
headscale_db_connections_idle

# Query duration
headscale_db_query_duration_seconds

# Connection pool
headscale_db_max_open_connections

Metrics Configuration

From config/config.yaml:
listen_addr: 0.0.0.0:8080
metrics_listen_addr: 0.0.0.0:9090
Metrics are bound to 0.0.0.0:9090 inside the container but exposed only to 127.0.0.1:9090 on the host via port mapping. Never expose metrics publicly without authentication.

Log Management

Viewing Logs

# All service logs
docker compose logs -f

# Specific service
docker compose logs -f headscale
docker compose logs -f postgres
docker compose logs -f nginx

# Last N lines
docker compose logs --tail 100 headscale

# With timestamps
docker compose logs -f --timestamps headscale

# Since specific time
docker compose logs --since 30m headscale

Log Levels

Configure logging in config/config.yaml:
log:
  format: text  # or: json
  level: info   # debug, info, warn, error
log:
  format: json
  level: info
Use JSON format for easier parsing by log aggregators.

Log Analysis

# Search for errors
docker compose logs headscale | grep -i error

# Count error occurrences
docker compose logs --since 24h headscale | grep -i error | wc -l

# Monitor for failed authentication
docker compose logs -f headscale | grep "authentication failed"

# Track node registrations
docker compose logs headscale | grep "node registered"

Log Rotation

Configure Docker log rotation in /etc/docker/daemon.json:
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}
# Apply configuration
sudo systemctl restart docker

Resource Monitoring

Container Resource Usage

# Real-time resource stats
docker stats

# Specific containers
docker stats headscale headscale-db nginx

# Single snapshot
docker stats --no-stream
NAME            CPU %   MEM USAGE / LIMIT   MEM %   NET I/O         BLOCK I/O
headscale       0.05%   41MB / 970MB        4%      1.6MB / 1.7MB   512KB / 0B
headscale-db    0.01%   25MB / 970MB        2%      800KB / 850KB   1MB / 2MB
nginx           0.00%   30MB / 970MB        3%      140KB / 148KB   0B / 0B
headplane       0.00%   180MB / 970MB       18%     7.6MB / 3.9MB   0B / 0B

System Resources

# Disk usage
df -h
du -sh data/ config/ backups/

# Docker disk usage
docker system df

# Detailed breakdown
docker system df -v

# Memory usage
free -h

# CPU load
uptime

Database Monitoring

# Connection count
docker exec headscale-db psql -U headscale -c "SELECT count(*) FROM pg_stat_activity;"

# Database size
docker exec headscale-db psql -U headscale -c "SELECT pg_size_pretty(pg_database_size('headscale'));"

# Active queries
docker exec headscale-db psql -U headscale -c "SELECT pid, age(clock_timestamp(), query_start), query FROM pg_stat_activity WHERE state != 'idle';"

# Table sizes
docker exec headscale-db psql -U headscale -c "SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;"

Monitoring Stack Setup

Prometheus + Grafana

Add monitoring services to your stack:
docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    ports:
      - "127.0.0.1:9091:9090"
    networks:
      - headscale-network
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3002:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    networks:
      - headscale-network
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false

volumes:
  prometheus-data:
  grafana-data:

Prometheus Configuration

Create prometheus.yml:
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'headscale'
    static_configs:
      - targets: ['headscale:9090']
        labels:
          service: 'headscale'

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']
        labels:
          service: 'postgres'

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
        labels:
          service: 'docker'

Alerting

Basic Alert Script

monitor-headscale.sh
#!/bin/bash

# Health check
if ! curl -sf http://localhost:8000/health > /dev/null; then
    echo "ALERT: Headscale health check failed" | mail -s "Headscale Down" [email protected]
fi

# Disk space check
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
    echo "ALERT: Disk usage at ${DISK_USAGE}%" | mail -s "Disk Space Critical" [email protected]
fi

# Database connection check
if ! docker exec headscale-db pg_isready -U headscale > /dev/null; then
    echo "ALERT: Database connection failed" | mail -s "Database Down" [email protected]
fi
Schedule with cron:
# Every 5 minutes
*/5 * * * * /path/to/monitor-headscale.sh

Prometheus Alertmanager

Create alertmanager.yml:
route:
  receiver: 'email'
  group_by: ['alertname', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'email'
    email_configs:
      - to: '[email protected]'
        from: '[email protected]'
        smarthost: 'smtp.example.com:587'
        auth_username: '[email protected]'
        auth_password: 'password'
Define alert rules in alerts.yml:
groups:
  - name: headscale
    rules:
      - alert: HeadscaleDown
        expr: up{job="headscale"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Headscale is down"

      - alert: HighMemoryUsage
        expr: container_memory_usage_bytes{name="headscale"} / container_spec_memory_limit_bytes{name="headscale"} > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Headscale memory usage above 90%"

      - alert: DatabaseConnectionsFull
        expr: headscale_db_connections_open >= headscale_db_max_open_connections
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Database connection pool exhausted"

Performance Monitoring

Key Performance Indicators

Response Time

  • /health endpoint: < 10ms
  • API endpoints: < 50ms
  • Node registration: < 500ms

Throughput

  • API requests: 100+ req/s
  • WebSocket connections: 1000+ concurrent
  • DERP relay: 100+ Mbps

Resource Usage

  • CPU: < 10% average
  • Memory: < 512MB typical
  • Disk I/O: < 10 MB/s

Availability

  • Uptime: 99.9%+
  • Health checks: 100% pass
  • Database: < 1s query time

Benchmarking

# API response time
time curl http://localhost:8000/health

# Load testing
ab -n 1000 -c 10 http://localhost:8000/health

# Database query performance
docker exec headscale-db psql -U headscale -c "EXPLAIN ANALYZE SELECT * FROM nodes;"

Status Page

Create a simple status page:
status.html
<!DOCTYPE html>
<html>
<head>
    <title>Headscale Status</title>
    <meta http-equiv="refresh" content="30">
</head>
<body>
    <h1>Headscale Status</h1>
    <div id="status"></div>
    
    <script>
        fetch('http://localhost:8000/health')
            .then(r => r.json())
            .then(d => {
                document.getElementById('status').innerHTML = 
                    `Status: ${d.status}<br>Last checked: ${new Date()}`;
            })
            .catch(e => {
                document.getElementById('status').innerHTML = 
                    `Status: Error - ${e.message}`;
            });
    </script>
</body>
</html>

Troubleshooting

# Check port binding
docker compose ps | grep headscale

# Verify metrics configuration
grep metrics_listen_addr config/config.yaml

# Test from inside container
docker exec headscale curl http://localhost:9090/metrics
# Check for memory leaks
docker stats --no-stream headscale

# Review database connection pool
grep max_open_conns config/config.yaml

# Restart service
docker compose restart headscale
# Check current log size
docker inspect headscale | grep LogPath
du -h $(docker inspect headscale | grep LogPath | cut -d'"' -f4)

# Configure log rotation
sudo nano /etc/docker/daemon.json
# Add log rotation settings

# Restart Docker
sudo systemctl restart docker

Troubleshooting

Diagnose and fix common issues

Security

Secure your monitoring endpoints

Build docs developers (and LLMs) love