Status: PlannedCategory: Stabilization
Objective
Tighten operations, validate all backup tiers, implement monitoring alerting, and document the final v3 state.Entry Criteria
Phase 4 complete — all services running on v3 infrastructure
Tasks
Validate Backup Tier 0 (PBS)
- Validate all PBS backup jobs
- Test restore of docker-prod-01 VM to confirm backups are valid
- Verify backup retention policy configured correctly
Validate Backup Tier 1 (Docker Appdata)
- Validate rsync backup script with mountpoint safety check and lockfile
- Validate Plex DB backup script with EXIT trap
- Test restore of appdata from NAS backup
Validate Backup Tier 2 (NAS Snapshots)
- Verify ZFS snapshots on backups + photos pool
- Test rollback to previous snapshot
Validate Backup Tier 3 (Synology ABB)
- Verify Synology ABB pulling from NAS correctly with read-only credentials
- Verify file counts match
Verify Monitoring & Alerting
- Verify all Healthchecks.io heartbeats are firing
- Test alert: manually miss a heartbeat, confirm notification received
- Verify Beszel showing all hosts
- Verify Uptime Kuma monitoring all critical services
Audit Authentik
- Verify all OIDC integrations
- Verify session policies
- Verify MFA enforcement for admins
Document v3 Final State
- Write master runbook
- Document all service configs
- Document all firewall rules
- Document all DNS rewrites
Backup Tiers
| Tier | What | Tool | Destination |
|---|---|---|---|
| Tier 0 | VM/LXC snapshots | Proxmox Backup Server (PBS) | pbs-prod-01 VM → ZFS mirror share on NAS |
| Tier 1 | Docker appdata + stacks | Hardened rsync script + Healthchecks | NAS /backups share (ZFS mirror pool) |
| Tier 1 | Plex database | Dedicated backup script (stop/backup/start) | NAS /backups/plex/db |
| Tier 2 | NAS share snapshots | Unraid ZFS snapshot (backups + photos pool) | Local ZFS snapshots on NAS |
| Tier 3 | Off-box cold copy | Synology ABB (pull-based, read-only creds) | Synology NAS — nightly |
| Tier 4 | Cloud backup | Backblaze B2 (future) | Critical data offsite — Immich photos, backups |
Monitoring & Alerting
Beszel — Host/VM Metrics
Beszel — Host/VM Metrics
- Beszel server runs on pi-prod-01
- Agents on all hosts: pve-prod-01, pve-prod-02, nas-prod-01, docker-prod-01
- Dashboards show CPU, RAM, disk, network for all hosts
Uptime Kuma — Service Uptime
Uptime Kuma — Service Uptime
- Uptime Kuma runs on pi-prod-01
- Monitors all critical services with HTTP checks
- Alerts via Discord on service down
Healthchecks.io — Backup Job Heartbeats
Healthchecks.io — Backup Job Heartbeats
- Backup scripts ping Healthchecks.io on success
- Alerts via Discord if heartbeat missed
- Monitors: backup-docker.sh, backup-plex-db.sh, PBS backup jobs
Security Hardening
Traefik Security Headers
- HSTS enabled
- X-Frame-Options: DENY
- X-Content-Type-Options: nosniff
- Referrer-Policy: strict-origin-when-cross-origin
Traefik Rate Limiting
- 100 requests per minute per IP for all services
- 10 requests per minute for auth endpoints
Exit Criteria
All backup tiers validated with test restores
All backup tiers validated with test restores
- PBS restore tested
- Docker appdata restore tested
- Plex DB restore tested
- ZFS snapshot rollback tested
- Synology ABB verified pulling correctly
All monitoring and alerting confirmed end-to-end
All monitoring and alerting confirmed end-to-end
- Beszel showing all hosts
- Uptime Kuma monitoring all services
- Healthchecks.io heartbeats firing
- Test alert sent and received
v3 master runbook written and committed
v3 master runbook written and committed
- All service configs documented
- All firewall rules documented
- All DNS rewrites documented
- Restore procedures documented
Next Phase
Phase 6 — Kubernetes (k3s/Talos)
Future k3s deployment — sandbox first, production later