Skip to main content

Monitoring Stack Overview

Homelab v3 uses a lightweight, distributed monitoring approach:
  • Beszel: Host and container metrics (CPU, RAM, disk, network)
  • Uptime Kuma: Service availability and uptime tracking
  • Healthchecks.io: Backup job heartbeat monitoring
  • Proxmox Built-in: Hypervisor and VM resource metrics
  • Unraid Built-in: Array health, disk temperatures, SMART data

Beszel - Host & Container Metrics

Architecture: Hub (server) runs on pi-prod-01, agents run on all hosts and VMs. Dashboard: https://beszel.giohosted.com (internal)

Installing Beszel Server (Hub)

Beszel hub runs on Raspberry Pi to ensure monitoring survives Proxmox node failures.
1

Deploy Beszel Hub

SSH to pi-prod-01:
mkdir -p /opt/beszel
cd /opt/beszel

cat > docker-compose.yaml <<EOF
version: '3'
services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel-hub
    restart: unless-stopped
    ports:
      - '8090:8090'
    volumes:
      - ./data:/beszel_data
    environment:
      - TZ=America/Chicago
EOF

docker compose up -d
2

Initial Setup

Navigate to http://192.168.10.20:8090
  • Create admin account
  • Set organization name
  • Configure alert channels (optional)
3

Configure Traefik (Optional)

Add DNS rewrite in AdGuard:beszel.giohosted.com192.168.10.20:8090

Installing Beszel Agent

Agents must be installed on:
  • nas-prod-01 (Unraid)
  • pve-prod-01 (Proxmox host)
  • pve-prod-02 (Proxmox host)
  • docker-prod-01 (VM)
  • auth-prod-01 (VM)
  • immich-prod-01 (VM)
  • pbs-prod-01 (VM)
1

Generate Agent Key

From Beszel hub UI:Systems → Add System
  • Name: hostname (e.g., docker-prod-01)
  • Copy the generated agent key
2

Install Agent (Debian/Ubuntu)

SSH to target host:
curl -sL https://raw.githubusercontent.com/henrygd/beszel/main/supplemental/scripts/install-agent.sh -o install-agent.sh
chmod +x install-agent.sh
sudo ./install-agent.sh
When prompted:
  • Port: 45876 (default)
  • Hub URL: http://192.168.10.20:8090
  • Agent Key: (paste from hub)
3

Install Agent (Unraid)

From Unraid Community Apps:
  • Search “Beszel Agent”
  • Install and configure:
    • Port: 45876
    • Agent Key: (from hub)
4

Verify Connection

Return to Beszel hub UI → SystemsHost should appear with green status and live metrics.

Configuring Beszel Alerts

1

Add Alert Channel

Beszel Hub → Settings → Alerts → Add ChannelOptions:
  • Discord webhook (recommended)
  • Email
  • Custom webhook
2

Create Alert Rules

Systems → [Select Host] → Alerts → Add AlertExample rules:High CPU Alert:
  • Metric: CPU Usage
  • Condition: > 80%
  • Duration: 5 minutes
  • Channel: Discord
Disk Space Alert:
  • Metric: Disk Usage
  • Condition: > 85%
  • Duration: 1 minute
  • Channel: Discord
Memory Pressure:
  • Metric: Memory Usage
  • Condition: > 90%
  • Duration: 10 minutes
  • Channel: Discord
3

Test Alerts

Trigger a test alert to verify channel configuration:Alerts → [Select Alert] → Test
OIDC Requirement: Beszel requires a custom Authentik scope with email_verified: true to function correctly. See Identity & Access documentation for configuration details.

Uptime Kuma - Service Availability

Location: pi-prod-01 (survives Proxmox failures) Dashboard: https://uptime.giohosted.com (internal only)

Deploying Uptime Kuma

1

Deploy Container

SSH to pi-prod-01:
mkdir -p /opt/uptime-kuma
cd /opt/uptime-kuma

cat > docker-compose.yaml <<EOF
version: '3'
services:
  uptime-kuma:
    image: louislam/uptime-kuma:latest
    container_name: uptime-kuma
    restart: unless-stopped
    ports:
      - '3001:3001'
    volumes:
      - ./data:/app/data
EOF

docker compose up -d
2

Initial Setup

Navigate to http://192.168.10.20:3001
  • Create admin account
  • Configure notification methods (Discord, email, etc.)

Adding Service Monitors

1

Add HTTP Monitor

Add New MonitorExample - Traefik:
  • Monitor Type: HTTP(s)
  • Friendly Name: Traefik Dashboard
  • URL: https://traefik.giohosted.com
  • Heartbeat Interval: 60 seconds
  • Retries: 3
  • Expected Status Code: 200
2

Add Ping Monitor

Example - Proxmox Node:
  • Monitor Type: Ping
  • Friendly Name: pve-prod-01
  • Hostname: 192.168.10.11
  • Heartbeat Interval: 60 seconds
3

Add DNS Monitor

Example - AdGuard:
  • Monitor Type: DNS
  • Friendly Name: AdGuard DNS (Primary)
  • Hostname: giohosted.com
  • DNS Server: 192.168.30.10
  • Expected Result: Matches DNS rewrite IP
ServiceTypeURL/HostInterval
Infrastructure
TraefikHTTPhttps://traefik.giohosted.com60s
AuthentikHTTPhttps://auth.giohosted.com60s
AdGuard (dns-prod-01)Ping192.168.30.1060s
AdGuard (dns-prod-02)Ping192.168.30.1560s
PBSHTTPhttps://192.168.30.12:8007300s
Media Services
PlexHTTPhttps://plex.giohosted.com120s
SonarrHTTPhttps://sonarr.giohosted.com120s
RadarrHTTPhttps://radarr.giohosted.com120s
ProwlarrHTTPhttps://prowlarr.giohosted.com120s
qBittorrentHTTPhttps://qbit.giohosted.com120s
Books & Photos
AudiobookshelfHTTPhttps://audiobooks.giohosted.com120s
ImmichHTTPhttps://photos.giohosted.com120s
Proxmox
pve-prod-01Ping192.168.10.1160s
pve-prod-02Ping192.168.10.1260s
NAS
nas-prod-01Ping192.168.10.1060s

Healthchecks.io - Backup Job Monitoring

Platform: SaaS (hosted) Dashboard: https://healthchecks.io

Creating New Healthcheck

1

Add Check

Healthchecks.io → Add Check
  • Name: backup-job-name
  • Tags: backups, critical
  • Period: 1 day (for daily jobs)
  • Grace Time: 1 hour
2

Configure Integrations

Integrations → Add Integration
  • Type: Discord, Email, or Slack
  • Configure webhook/email
3

Add Ping to Script

Copy the unique ping URL and add to backup script:
#!/bin/bash
# Backup script

# ... backup commands ...

# On success, ping Healthchecks
if [ $? -eq 0 ]; then
  curl -fsS --retry 3 https://hc-ping.com/YOUR-UNIQUE-UUID > /dev/null
fi

Monitored Backup Jobs

JobScheduleGrace Period
docker-appdata-backupDaily 04:001 hour
plex-db-backupDaily 03:001 hour
pbs-backup-pve-prod-01Daily 01:002 hours
pbs-backup-pve-prod-02Daily 01:302 hours
synology-abb-pullNightly 02:003 hours

Proxmox Monitoring

Proxmox VE includes built-in monitoring for nodes, VMs, and LXCs.

Viewing Metrics

1

Node-Level Metrics

Proxmox UI → [Select Node]SummaryView:
  • CPU usage (current + historical)
  • Memory usage
  • Disk I/O
  • Network traffic
2

VM/LXC Metrics

Proxmox UI → [Select VM/LXC]SummaryView real-time resource consumption for individual guests.
3

Storage Usage

Proxmox UI → Datacenter → StorageMonitor:
  • Local storage usage
  • NFS mount health
  • PBS datastore capacity

Configuring Email Alerts

1

Configure SMTP Relay

Proxmox UI → Datacenter → Notifications → SMTP
  • SMTP Server: (your mail server)
  • Port: 587 (TLS)
  • Username/Password: (credentials)
  • From Address: [email protected]
2

Create Notification Target

Notifications → Add
  • Type: Email
  • Recipient: Your email address
  • Minimum Severity: warning
3

Test Notification

Notifications → [Select Target] → Test

Unraid Monitoring

Unraid includes comprehensive monitoring for array health and disk status.

Dashboard Metrics

Unraid Main Dashboard shows:
  • Array status (healthy, rebuilding, degraded)
  • Parity check status and history
  • Individual disk temperatures
  • Network throughput
  • Docker container status

Configuring Notifications

1

Enable Discord Notifications

Unraid UI → Settings → Notifications
  • Discord Webhook URL: (from Discord server)
  • Notification Types:
    • Array errors (critical)
    • Disk temperature warnings (> 50°C)
    • Parity check completion
    • Docker container crashes
2

Test Notification

Settings → Notifications → Test

SMART Monitoring

Unraid runs SMART tests automatically.
1

View SMART Data

Unraid UI → Main → [Select Disk] → SMART InfoKey metrics:
  • Reallocated Sectors Count (should be 0)
  • Current Pending Sector Count (should be 0)
  • Power On Hours
  • Temperature
2

Configure SMART Test Schedule

Settings → Disk Settings → SMART TestingRecommended:
  • Short test: Weekly
  • Long test: Monthly

Alert Escalation Policy

Severity Levels

Critical (Immediate Action):
  • Proxmox node down
  • NAS array degraded
  • PBS backup failed 2+ consecutive days
  • Disk SMART failure
Warning (Review Within 24h):
  • High CPU/RAM usage (> 80% sustained)
  • Disk space > 85%
  • Service downtime < 5 minutes
  • Backup job late but not failed
Info (Review Weekly):
  • Routine maintenance notifications
  • Successful backup completions
  • Software updates available

Notification Channels by Severity

SeverityDiscordEmailSMS
Critical(optional)
Warning
Info

Monitoring Checklist

Daily:
  • Check Beszel dashboard for anomalies
  • Verify Uptime Kuma shows all services green
  • Confirm Healthchecks.io backup heartbeats
Weekly:
  • Review Unraid array health and temps
  • Check Proxmox storage usage
  • Review Beszel historical trends
Monthly:
  • Review PBS backup job logs
  • Check SMART data on all drives
  • Test restore from one backup tier
  • Update monitoring dashboards with new services

Build docs developers (and LLMs) love