Objective
Introduce Kubernetes as an isolated learning cluster. No production services migrate until the cluster is fully understood and stable. Build skills deliberately — sandbox phase is complete before any production workload is considered.Entry Criteria
Phase 5 complete — all backup tiers validated, monitoring confirmed, v3 documented and stable
Locked Approach
Sandbox First. Production Later.Production services stay on Docker until k3s is proven stable and operator is comfortable. The sandbox cluster runs in parallel — failure there has zero impact on running services.Only after sandbox phase is complete will selective production migration begin.
Planned Architecture
Cluster Design
- Talos Linux VMs on both Proxmox nodes
- MS-A2: Talos control plane VM + Talos worker 1
- Optiplex: Talos worker 2
- Storage: Longhorn for PVCs inside the cluster
- Ingress: Traefik ingress controller (familiar from Docker context — same mental model)
- GitOps: Flux or ArgoCD
Tooling to Learn
Core Kubernetes
- kubectl
- Pods, deployments, services, namespaces
- ConfigMaps, Secrets
Storage
- Longhorn
- PVCs and PVs
- Storage classes
Networking
- Traefik ingress controller
- cert-manager for TLS
- Ingress resources
GitOps & Automation
- Flux or ArgoCD
- Helm charts
- Operators
Sandbox Phase (No Production Traffic)
Stand Up Cluster
- Create Talos VMs on pve-prod-01 and pve-prod-02
- Bootstrap Talos cluster
- Configure kubectl access
Learn Core Primitives
- Deploy test workloads
- Learn pods, deployments, services, namespaces
- Understand ConfigMaps and Secrets
Configure Ingress
- Install Traefik ingress controller
- Install cert-manager
- Test TLS certificate issuance
Production Migration Candidates (Post-Sandbox Only)
These services benefit from HA and operator-managed upgrades — they are good candidates for k3s migration once the cluster is stable.
| Service | Reason |
|---|---|
| Immich | Benefits from HA and operator-managed upgrades |
| Authentik | IdP should be highly available |
| Beszel / Uptime Kuma | Monitoring infrastructure |
| Traefik | Already the ingress controller in k3s, natural fit |
Intentionally Staying in Docker
| Service | Reason |
|---|---|
| ARR stack (Sonarr, Radarr, Prowlarr, Bazarr) | Hardlinks and atomic moves make k8s messy |
| qBittorrent + Gluetun | VPN killswitch model does not translate cleanly to k8s networking |
| Books stack (CWA, ABS, Shelfmark) | Ingest/hardlink workflows are filesystem-dependent |
VM Resource Allocation
| VM | Host | vCPU | RAM | Role |
|---|---|---|---|---|
| k3s-ctrl-lab-01 | pve-prod-01 | 2 | 4GB | Control plane |
| k3s-work-lab-01 | pve-prod-01 | 4 | 8GB | Worker node |
| k3s-work-lab-02 | pve-prod-02 | 4 | 8GB | Worker node |
Key Constraints
No Production Services Until Sandbox CompleteThe sandbox cluster runs in parallel with production Docker services. Zero production traffic touches the k3s cluster until it is proven stable.
Learning Resources
- Talos Linux Documentation
- Longhorn Documentation
- Traefik Kubernetes Ingress
- Flux Documentation
- ArgoCD Documentation
Exit Criteria
Sandbox cluster stable for 30+ days
Sandbox cluster stable for 30+ days
- No unexpected crashes or restarts
- All test workloads running reliably
- Storage, networking, and ingress working as expected
Operator comfortable with k8s tooling
Operator comfortable with k8s tooling
- kubectl commands second nature
- Troubleshooting pods, logs, events routine
- GitOps workflow understood and practiced
Production migration plan documented
Production migration plan documented
- Which services migrate first
- Rollback plan for each service
- Monitoring and alerting for k3s cluster
This Phase Is Its Own ProjectPhase 6 is a dedicated learning environment. It has its own timeline, its own success criteria, and its own documentation. It exists in parallel with production infrastructure — not as a replacement.