Skip to main content
Status: PlannedCategory: Stabilization

Objective

Tighten operations, validate all backup tiers, implement monitoring alerting, and document the final v3 state.

Entry Criteria

Phase 4 complete — all services running on v3 infrastructure

Tasks

1

Validate Backup Tier 0 (PBS)

  • Validate all PBS backup jobs
  • Test restore of docker-prod-01 VM to confirm backups are valid
  • Verify backup retention policy configured correctly
2

Validate Backup Tier 1 (Docker Appdata)

  • Validate rsync backup script with mountpoint safety check and lockfile
  • Validate Plex DB backup script with EXIT trap
  • Test restore of appdata from NAS backup
3

Validate Backup Tier 2 (NAS Snapshots)

  • Verify ZFS snapshots on backups + photos pool
  • Test rollback to previous snapshot
4

Validate Backup Tier 3 (Synology ABB)

  • Verify Synology ABB pulling from NAS correctly with read-only credentials
  • Verify file counts match
5

Verify Monitoring & Alerting

  • Verify all Healthchecks.io heartbeats are firing
  • Test alert: manually miss a heartbeat, confirm notification received
  • Verify Beszel showing all hosts
  • Verify Uptime Kuma monitoring all critical services
6

Evaluate Backblaze B2

Evaluate Backblaze B2 for Immich photos + backups (Tier 4 cloud backup)
7

Tighten Traefik Middleware

Review and tighten Traefik middleware: rate limiting, security headers
8

Audit Authentik

  • Verify all OIDC integrations
  • Verify session policies
  • Verify MFA enforcement for admins
9

Document v3 Final State

  • Write master runbook
  • Document all service configs
  • Document all firewall rules
  • Document all DNS rewrites
10

VLAN Penetration Test

Verify VLAN firewall rules with penetration test from IoT VLAN

Backup Tiers

TierWhatToolDestination
Tier 0VM/LXC snapshotsProxmox Backup Server (PBS)pbs-prod-01 VM → ZFS mirror share on NAS
Tier 1Docker appdata + stacksHardened rsync script + HealthchecksNAS /backups share (ZFS mirror pool)
Tier 1Plex databaseDedicated backup script (stop/backup/start)NAS /backups/plex/db
Tier 2NAS share snapshotsUnraid ZFS snapshot (backups + photos pool)Local ZFS snapshots on NAS
Tier 3Off-box cold copySynology ABB (pull-based, read-only creds)Synology NAS — nightly
Tier 4Cloud backupBackblaze B2 (future)Critical data offsite — Immich photos, backups
PBS Does Not Back Up Application DataPBS backs up VM disk images only — not application data inside VMs. Application-level backups (Docker appdata, Plex DB) remain essential and run independently of PBS.

Monitoring & Alerting

  • Beszel server runs on pi-prod-01
  • Agents on all hosts: pve-prod-01, pve-prod-02, nas-prod-01, docker-prod-01
  • Dashboards show CPU, RAM, disk, network for all hosts
  • Uptime Kuma runs on pi-prod-01
  • Monitors all critical services with HTTP checks
  • Alerts via Discord on service down
  • Backup scripts ping Healthchecks.io on success
  • Alerts via Discord if heartbeat missed
  • Monitors: backup-docker.sh, backup-plex-db.sh, PBS backup jobs

Security Hardening

1

Traefik Security Headers

  • HSTS enabled
  • X-Frame-Options: DENY
  • X-Content-Type-Options: nosniff
  • Referrer-Policy: strict-origin-when-cross-origin
2

Traefik Rate Limiting

  • 100 requests per minute per IP for all services
  • 10 requests per minute for auth endpoints
3

Authentik MFA Enforcement

  • MFA required for all admin accounts
  • TOTP or WebAuthn supported
4

VLAN Firewall Audit

  • IoT VLAN cannot reach internal services
  • Services VLAN cannot reach Management VLAN
  • Services VLAN cannot initiate to Trusted VLAN

Exit Criteria

  • PBS restore tested
  • Docker appdata restore tested
  • Plex DB restore tested
  • ZFS snapshot rollback tested
  • Synology ABB verified pulling correctly
  • Beszel showing all hosts
  • Uptime Kuma monitoring all services
  • Healthchecks.io heartbeats firing
  • Test alert sent and received
  • All service configs documented
  • All firewall rules documented
  • All DNS rewrites documented
  • Restore procedures documented

Next Phase

Phase 6 — Kubernetes (k3s/Talos)

Future k3s deployment — sandbox first, production later

Build docs developers (and LLMs) love