Design Philosophy
Homelab v3 is a clean-slate rebuild with institutional knowledge from v2. Every decision is deliberate, documented, and built toward future goals (Kubernetes, GitOps, HA). Where v2 evolved organically, v3 is designed intentionally.Core Principles
| Principle | Applied In v2? | v3 Commitment |
|---|---|---|
| Bare-metal NAS separation | No (TrueNAS as VM) | Dedicated NAS host |
| Proxmox OS redundancy | No (single NVMe) | Mirrored NVMe boot |
| Proper VLAN segmentation with rules | Partial (no FW rules) | Full inter-VLAN firewall |
| UPS protection | No | Required before power-on |
| Kubernetes-ready architecture | No | Designed for future k3s |
Infrastructure Topology
Infrastructure Layers
Network Layer (VLAN 10 - Management)
UniFi Dream Machine SE acts as the primary router, firewall, and VLAN gateway. The USW-Pro-Max-24 provides 2.5GbE connectivity to all hosts with full VLAN trunking. Four VLANs segregate traffic:- VLAN 10 (Management): Infrastructure devices only
- VLAN 20 (Trusted): Personal devices
- VLAN 30 (Services): All VMs and LXCs
- VLAN 40 (IoT): Isolated smart devices, internet-only
Compute Layer
Primary Node (pve-prod-01): Minisforum MS-A2 with Ryzen 9 7945HX (16C/32T) and 32GB DDR5 runs all production workloads:docker-prod-01: Media automation stack (ARR, Plex clients, books)auth-prod-01: Authentik identity provider (dedicated VM)immich-prod-01: Photo management with ML worker (dedicated VM)dns-prod-01: Primary AdGuard Home (LXC)
pbs-prod-01: Proxmox Backup Serverdns-prod-02: Secondary AdGuard Home (LXC)
HA requires matched hardware and 3+ node quorum to be meaningful. The Optiplex cannot absorb the MS-A2’s workload, so HA would provide false security.
Storage Layer
nas-prod-01 runs Unraid 7.2.4 on dedicated hardware (i5-13400, 32GB DDR4) in a 4U 12-bay chassis:- Parity Array: 2x 12TB parity + 3x 12TB data (~36TB usable) for bulk media and downloads
- ZFS Mirror Pool: 2x 4TB mirror (~4TB usable) for precious data (photos, backups)
- 10GbE Direct Link: DAC cable to MS-A2 for storage traffic (off LAN switch)
- Plex Server: Runs natively on Unraid using i5-13400 QuickSync iGPU for transcoding
Power & Resilience
UPS Protection: Tripp-Lite 1500VA protects all devices. NUT (Network UPS Tools) server on nas-prod-01 orchestrates graceful shutdown:- Low battery detected → NUT signals all hosts
- Proxmox VMs and LXCs shut down gracefully
- Proxmox hosts shut down
- Unraid shuts down last
Key Architecture Decisions
Dedicated NAS Host
v2 ran TrueNAS as a VM — a fragile design. v3 gives the NAS dedicated bare-metal hardware with proper HBA and 10GbE connectivity.Mirrored Proxmox Boot
Single NVMe boot was v2’s primary fragility. MS-A2 uses ZFS RAID-1 mirror on two NVMe drives — single drive failure doesn’t kill the hypervisor.Service Isolation Strategy
- Authentik gets a dedicated VM (auth-prod-01) — LXC deployment is unstable per official documentation
- Immich gets a dedicated VM (immich-prod-01) — ML worker causes CPU spikes; isolation allows resource caps
- Media stack consolidates on one VM (docker-prod-01) — simpler than multi-VM Docker; will migrate selectively to k3s in Phase 6
No Cache Pool at Launch
Downloads must write directly to the parity array to maintain hardlinks with media files. Cache involvement breaks this. Container appdata lives on VM local disks. No workload justifies cache at launch.Kubernetes Readiness
v3 hardware and network design intentionally leaves headroom for future Kubernetes deployment:- Traefik familiarity translates directly to k3s ingress controller
- Resource allocation planned for k3s control plane + worker VMs
- Service isolation (Authentik, Immich) maps cleanly to k8s deployments
- ARR stack and qBittorrent intentionally stay in Docker (hardlinks and VPN killswitch don’t translate to k8s)
Phase 6 introduces k3s as an isolated sandbox cluster first. No production services migrate until the cluster is proven stable. Failure in the sandbox has zero impact on running services.
Access & Identity
Internal Access
Split-Horizon DNS: AdGuard Home rewrites*.giohosted.com to internal Traefik IP. Same FQDNs work internally and externally with valid TLS.
Reverse Proxy: Traefik with Cloudflare DNS-01 wildcard certificate (*.giohosted.com) terminates TLS for all internal services. Docker label-based routing.
External Access
- Cloudflare Tunnel: cloudflared container on docker-prod-01 exposes Authentik, Audiobookshelf, Shelfmark, Seerr
- Plex Direct Port Forward: Port 32400 forwarded to nas-prod-01 (Cloudflare ToS prohibits video streaming through tunnels)
- WireGuard VPN: UDM-SE endpoint for secure remote LAN access
Identity Provider
Authentik provides OIDC/OAuth2 SSO for all services:- Proxmox (both nodes)
- Audiobookshelf, Calibre-Web-Automated, Immich, Homarr, Beszel
- Cloudflare Access policies
- MFA enforced for admin accounts
Migration Strategy
v3 runs in parallel with v2 infrastructure. No big-bang cutover:- v3 built on new VLANs (Services VLAN 30)
- v2 services remain on Trusted VLAN 20 during build
- Per-service cutover via AdGuard DNS rewrite flips
- Instant rollback by reversing DNS rewrite
- v2 decommissioned only after 48-hour validation