Talos Linux supports zero-downtime upgrades for both the Talos operating system and Kubernetes. Upgrades are performed in-place using the installer image, with automatic health checks and rollback capabilities.
Upgrading Talos Linux
Talos upgrades are performed by specifying a new installer image. The upgrade process downloads the new image, installs it, and reboots the node.
Checking Current Version
First, check the current Talos version:
talosctl --nodes 10.0.0.2 version
Example output:
Client:
Tag: v1.6.0
Go version: go1.21.5
Server:
NODE 10.0.0.2
Tag: v1.5.5
Go version: go1.21.4
Upgrade a single node to the latest version:
talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0
The upgrade command supports several important flags:
--preserve: Preserve data during upgrade (default behavior)
--stage: Stage the upgrade to be applied on next reboot
--force: Skip health checks (use with caution)
--reboot-mode: Control reboot behavior (default, powercycle)
Upgrade with Wait
Wait for the upgrade to complete and verify the node comes back healthy:
talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
Example output:
NODE ACK STARTED
10.0.0.2 true 2024-03-04T10:15:30Z
waiting for node reboot...
node rebooted
waiting for node to be ready...
node is ready
Staged Upgrades
Stage an upgrade to be applied on the next reboot:
talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--stage
This allows you to control when the node reboots:
talosctl reboot --nodes 10.0.0.2
Upgrading the Entire Cluster
Always upgrade control plane nodes one at a time to maintain etcd quorum and cluster availability.
Upgrade control plane nodes sequentially
Upgrade each control plane node one at a time:talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
talosctl upgrade --nodes 10.0.0.3 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
talosctl upgrade --nodes 10.0.0.4 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
Wait for each node to complete before proceeding to the next. Verify control plane health
After upgrading all control plane nodes, verify cluster health:talosctl health \
--control-plane-nodes 10.0.0.2,10.0.0.3,10.0.0.4
kubectl get nodes
Upgrade worker nodes
Worker nodes can be upgraded in parallel or sequentially depending on your workload:# Upgrade workers one at a time
talosctl upgrade --nodes 10.0.0.5 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
talosctl upgrade --nodes 10.0.0.6 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--wait
Kubernetes automatically reschedules pods during worker node upgrades.
Reboot Modes
Talos supports different reboot modes during upgrades:
Default Mode (kexec): Fast reboot using kexec
talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--reboot-mode default
Powercycle Mode: Full power cycle (slower but more thorough)
talosctl upgrade --nodes 10.0.0.2 \
--image ghcr.io/siderolabs/installer:v1.6.0 \
--reboot-mode powercycle
Rolling Back an Upgrade
If an upgrade fails or causes issues, you can roll back:
talosctl rollback --nodes 10.0.0.2
This reverts to the previous Talos version installed on the node.
Upgrading Kubernetes
Kubernetes upgrades are managed separately from Talos upgrades using the upgrade-k8s command.
Checking Kubernetes Version
Check current Kubernetes version:
kubectl version --short
talosctl --nodes 10.0.0.2 get kubernetesversion
Upgrading Kubernetes
Upgrade Kubernetes to a new version:
talosctl upgrade-k8s --to 1.29.0
The upgrade process:
- Detects the current Kubernetes version
- Validates the upgrade path
- Pre-pulls container images
- Updates control plane components
- Updates kubelet on all nodes
- Applies necessary Kubernetes manifests
Upgrade from Specific Version
Explicitly specify the source version:
talosctl upgrade-k8s \
--from 1.28.0 \
--to 1.29.0
Dry Run Mode
Preview the upgrade without making changes:
talosctl upgrade-k8s --to 1.29.0 --dry-run
Example output:
Automatically detected the lowest Kubernetes version 1.28.0
> Upgrading Kubernetes from v1.28.0 to v1.29.0
> Will upgrade 3 control plane nodes
> Will upgrade 2 worker nodes
> Will pull images:
- registry.k8s.io/kube-apiserver:v1.29.0
- registry.k8s.io/kube-controller-manager:v1.29.0
- registry.k8s.io/kube-scheduler:v1.29.0
- registry.k8s.io/kube-proxy:v1.29.0
Advanced Upgrade Options
Skip kubelet upgrade (control plane only):
talosctl upgrade-k8s --to 1.29.0 --upgrade-kubelet=false
Skip image pre-pulling (faster but riskier):
talosctl upgrade-k8s --to 1.29.0 --pre-pull-images=false
Specify custom images:
talosctl upgrade-k8s --to 1.29.0 \
--apiserver-image registry.k8s.io/kube-apiserver:v1.29.0 \
--controller-manager-image registry.k8s.io/kube-controller-manager:v1.29.0
Kubernetes Upgrade Best Practices
- Always upgrade one minor version at a time: Don’t skip versions (e.g., 1.27 → 1.28 → 1.29)
- Test in non-production first: Validate upgrades in staging environments
- Check compatibility: Ensure workloads are compatible with the target version
- Monitor during upgrade: Watch pod status and cluster metrics
- Backup etcd before upgrading: Create an etcd snapshot as a precaution
Upgrade Maintenance Windows
For production clusters, plan maintenance windows for upgrades:
Pre-maintenance
- Create etcd backup
- Document current versions
- Review release notes
- Test in staging environment
During maintenance
- Upgrade control plane nodes sequentially
- Verify control plane health between nodes
- Upgrade worker nodes
- Monitor workload health
Post-maintenance
- Verify all nodes are at target version
- Run health checks
- Validate workload functionality
- Document upgrade results
Upgrade Troubleshooting
Node Stuck During Upgrade
If a node doesn’t complete the upgrade:
-
Check node status:
talosctl --nodes 10.0.0.2 dmesg
talosctl --nodes 10.0.0.2 logs kubelet
-
Force reboot if necessary:
talosctl --nodes 10.0.0.2 reboot
-
Roll back if issues persist:
talosctl --nodes 10.0.0.2 rollback
etcd Quorum Lost
If etcd loses quorum during upgrade:
-
Check etcd members:
talosctl --nodes 10.0.0.2 etcd members
-
Wait for nodes to rejoin or restore from backup (see disaster recovery)
Kubernetes Components Not Starting
Check component logs:
talosctl --nodes 10.0.0.2 logs kubelet
kubectl logs -n kube-system kube-apiserver-controlplane-1