Overview
Effective monitoring ensures your Headscale deployment remains healthy and performant. This guide covers health checks, metrics collection, log analysis, and alerting strategies.
Health Checks
Headscale Health Endpoint
Headscale exposes a health check endpoint for monitoring service status:
curl http://localhost:8000/health
Healthy Response
Unhealthy Response
Container Health Status
All services include Docker health checks:
# Check all container health
docker compose ps
# Detailed health status
docker inspect --format= '{{.State.Health.Status}}' headscale
docker inspect --format= '{{.State.Health.Status}}' headscale-db
docker inspect --format= '{{.State.Health.Status}}' nginx
Health checks run automatically:
Headscale : Every 30s (command: headscale health)
PostgreSQL : Every 10s (command: pg_isready)
nginx : Every 30s (HTTP check to /health)
Health Check Configuration
From docker-compose.yml:
headscale :
healthcheck :
test : [ CMD , headscale , health ]
interval : 30s
timeout : 10s
retries : 3
start_period : 10s
postgres :
healthcheck :
test : [ CMD-SHELL , "pg_isready -U headscale" ]
interval : 10s
timeout : 5s
retries : 5
nginx :
healthcheck :
test : [ CMD , wget , --quiet , --tries=1 , --spider , http://localhost:8080/health ]
interval : 30s
timeout : 5s
retries : 3
start_period : 10s
Prometheus Metrics
Headscale exposes Prometheus-compatible metrics for detailed monitoring.
Metrics Endpoint
Access metrics on port 9090 (localhost only for security):
# View all metrics
curl http://localhost:9090/metrics
# Filter specific metrics
curl http://localhost:9090/metrics | grep headscale_
Key Metrics
# Total registered nodes
headscale_nodes_total
# Nodes by state
headscale_nodes_registered
headscale_nodes_online
headscale_nodes_offline
# Node registration rate
rate(headscale_node_registrations_total[5m])
# Active connections
headscale_derp_connections_active
# Data transfer
headscale_network_bytes_sent_total
headscale_network_bytes_received_total
# Connection quality
headscale_connection_latency_seconds
# Request rate
rate(headscale_http_requests_total[1m])
# Request duration
headscale_http_request_duration_seconds
# Error rate
rate(headscale_http_requests_total{code=~"5.."}[5m])
# Database connections
headscale_db_connections_open
headscale_db_connections_idle
# Query duration
headscale_db_query_duration_seconds
# Connection pool
headscale_db_max_open_connections
Metrics Configuration
From config/config.yaml:
listen_addr : 0.0.0.0:8080
metrics_listen_addr : 0.0.0.0:9090
Metrics are bound to 0.0.0.0:9090 inside the container but exposed only to 127.0.0.1:9090 on the host via port mapping. Never expose metrics publicly without authentication.
Log Management
Viewing Logs
# All service logs
docker compose logs -f
# Specific service
docker compose logs -f headscale
docker compose logs -f postgres
docker compose logs -f nginx
# Last N lines
docker compose logs --tail 100 headscale
# With timestamps
docker compose logs -f --timestamps headscale
# Since specific time
docker compose logs --since 30m headscale
Log Levels
Configure logging in config/config.yaml:
log :
format : text # or: json
level : info # debug, info, warn, error
log :
format : json
level : info
Use JSON format for easier parsing by log aggregators. log :
format : text
level : debug
Use text format and debug level for troubleshooting.
Log Analysis
# Search for errors
docker compose logs headscale | grep -i error
# Count error occurrences
docker compose logs --since 24h headscale | grep -i error | wc -l
# Monitor for failed authentication
docker compose logs -f headscale | grep "authentication failed"
# Track node registrations
docker compose logs headscale | grep "node registered"
Log Rotation
Configure Docker log rotation in /etc/docker/daemon.json:
{
"log-driver" : "json-file" ,
"log-opts" : {
"max-size" : "10m" ,
"max-file" : "3"
}
}
# Apply configuration
sudo systemctl restart docker
Resource Monitoring
Container Resource Usage
# Real-time resource stats
docker stats
# Specific containers
docker stats headscale headscale-db nginx
# Single snapshot
docker stats --no-stream
NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
headscale 0.05% 41MB / 970MB 4% 1.6MB / 1.7MB 512KB / 0B
headscale-db 0.01% 25MB / 970MB 2% 800KB / 850KB 1MB / 2MB
nginx 0.00% 30MB / 970MB 3% 140KB / 148KB 0B / 0B
headplane 0.00% 180MB / 970MB 18% 7.6MB / 3.9MB 0B / 0B
System Resources
# Disk usage
df -h
du -sh data/ config/ backups/
# Docker disk usage
docker system df
# Detailed breakdown
docker system df -v
# Memory usage
free -h
# CPU load
uptime
Database Monitoring
# Connection count
docker exec headscale-db psql -U headscale -c "SELECT count(*) FROM pg_stat_activity;"
# Database size
docker exec headscale-db psql -U headscale -c "SELECT pg_size_pretty(pg_database_size('headscale'));"
# Active queries
docker exec headscale-db psql -U headscale -c "SELECT pid, age(clock_timestamp(), query_start), query FROM pg_stat_activity WHERE state != 'idle';"
# Table sizes
docker exec headscale-db psql -U headscale -c "SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;"
# Database size
ls -lh data/db.sqlite
du -h data/db.sqlite
# Check integrity
docker exec headscale sqlite3 /var/lib/headscale/db.sqlite "PRAGMA integrity_check;"
# View tables
docker exec headscale sqlite3 /var/lib/headscale/db.sqlite ".tables"
# Row counts
docker exec headscale sqlite3 /var/lib/headscale/db.sqlite "SELECT 'nodes', COUNT(*) FROM nodes UNION SELECT 'users', COUNT(*) FROM users;"
Monitoring Stack Setup
Prometheus + Grafana
Add monitoring services to your stack:
docker-compose.monitoring.yml
services :
prometheus :
image : prom/prometheus:latest
container_name : prometheus
volumes :
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
ports :
- "127.0.0.1:9091:9090"
networks :
- headscale-network
command :
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
grafana :
image : grafana/grafana:latest
container_name : grafana
ports :
- "3002:3000"
volumes :
- grafana-data:/var/lib/grafana
networks :
- headscale-network
environment :
- GF_SECURITY_ADMIN_PASSWORD=changeme
- GF_USERS_ALLOW_SIGN_UP=false
volumes :
prometheus-data :
grafana-data :
Prometheus Configuration
Create prometheus.yml:
global :
scrape_interval : 15s
evaluation_interval : 15s
scrape_configs :
- job_name : 'headscale'
static_configs :
- targets : [ 'headscale:9090' ]
labels :
service : 'headscale'
- job_name : 'postgres'
static_configs :
- targets : [ 'postgres-exporter:9187' ]
labels :
service : 'postgres'
- job_name : 'cadvisor'
static_configs :
- targets : [ 'cadvisor:8080' ]
labels :
service : 'docker'
Alerting
Basic Alert Script
#!/bin/bash
# Health check
if ! curl -sf http://localhost:8000/health > /dev/null ; then
echo "ALERT: Headscale health check failed" | mail -s "Headscale Down" [email protected]
fi
# Disk space check
DISK_USAGE = $( df -h / | awk 'NR==2 {print $5}' | sed 's/%//' )
if [ " $DISK_USAGE " -gt 90 ]; then
echo "ALERT: Disk usage at ${ DISK_USAGE }%" | mail -s "Disk Space Critical" [email protected]
fi
# Database connection check
if ! docker exec headscale-db pg_isready -U headscale > /dev/null ; then
echo "ALERT: Database connection failed" | mail -s "Database Down" [email protected]
fi
Schedule with cron:
# Every 5 minutes
* /5 * * * * /path/to/monitor-headscale.sh
Prometheus Alertmanager
Create alertmanager.yml:
route :
receiver : 'email'
group_by : [ 'alertname' , 'service' ]
group_wait : 30s
group_interval : 5m
repeat_interval : 4h
receivers :
- name : 'email'
email_configs :
- to : '[email protected] '
from : '[email protected] '
smarthost : 'smtp.example.com:587'
auth_username : '[email protected] '
auth_password : 'password'
Define alert rules in alerts.yml:
groups :
- name : headscale
rules :
- alert : HeadscaleDown
expr : up{job="headscale"} == 0
for : 2m
labels :
severity : critical
annotations :
summary : "Headscale is down"
- alert : HighMemoryUsage
expr : container_memory_usage_bytes{name="headscale"} / container_spec_memory_limit_bytes{name="headscale"} > 0.9
for : 5m
labels :
severity : warning
annotations :
summary : "Headscale memory usage above 90%"
- alert : DatabaseConnectionsFull
expr : headscale_db_connections_open >= headscale_db_max_open_connections
for : 2m
labels :
severity : warning
annotations :
summary : "Database connection pool exhausted"
Response Time
/health endpoint: < 10ms
API endpoints: < 50ms
Node registration: < 500ms
Throughput
API requests: 100+ req/s
WebSocket connections: 1000+ concurrent
DERP relay: 100+ Mbps
Resource Usage
CPU: < 10% average
Memory: < 512MB typical
Disk I/O: < 10 MB/s
Availability
Uptime: 99.9%+
Health checks: 100% pass
Database: < 1s query time
Benchmarking
# API response time
time curl http://localhost:8000/health
# Load testing
ab -n 1000 -c 10 http://localhost:8000/health
# Database query performance
docker exec headscale-db psql -U headscale -c "EXPLAIN ANALYZE SELECT * FROM nodes;"
Status Page
Create a simple status page:
<! DOCTYPE html >
< html >
< head >
< title > Headscale Status </ title >
< meta http-equiv = "refresh" content = "30" >
</ head >
< body >
< h1 > Headscale Status </ h1 >
< div id = "status" ></ div >
< script >
fetch ( 'http://localhost:8000/health' )
. then ( r => r . json ())
. then ( d => {
document . getElementById ( 'status' ). innerHTML =
`Status: ${ d . status } <br>Last checked: ${ new Date () } ` ;
})
. catch ( e => {
document . getElementById ( 'status' ). innerHTML =
`Status: Error - ${ e . message } ` ;
});
</ script >
</ body >
</ html >
Troubleshooting
Metrics endpoint not accessible
# Check port binding
docker compose ps | grep headscale
# Verify metrics configuration
grep metrics_listen_addr config/config.yaml
# Test from inside container
docker exec headscale curl http://localhost:9090/metrics
# Check for memory leaks
docker stats --no-stream headscale
# Review database connection pool
grep max_open_conns config/config.yaml
# Restart service
docker compose restart headscale
# Check current log size
docker inspect headscale | grep LogPath
du -h $( docker inspect headscale | grep LogPath | cut -d '"' -f4 )
# Configure log rotation
sudo nano /etc/docker/daemon.json
# Add log rotation settings
# Restart Docker
sudo systemctl restart docker
Troubleshooting Diagnose and fix common issues
Security Secure your monitoring endpoints