Cluster Management

Cluster management involves the ongoing operational tasks required to maintain a healthy Talos Linux cluster. This includes adding new nodes, removing nodes, applying configuration changes, and managing cluster state.

Adding Nodes to a Cluster

To add new nodes to an existing cluster, you need to generate machine configurations and apply them to the new nodes.

Generate machine configuration

Generate the machine configuration for the new node. Use the same cluster configuration to maintain consistency:

talosctl gen config my-cluster https://controlplane.example.com:6443 \
  --output-types controlplane,worker

This generates configuration files for both control plane and worker nodes.

Apply configuration to new node

Apply the appropriate configuration to the new node using maintenance mode:

talosctl apply-config --insecure \
  --nodes 10.0.0.5 \
  --file controlplane.yaml

Use --insecure flag when applying configuration to a node for the first time, as it doesn’t have certificates yet.

Wait for node to join

Monitor the node as it joins the cluster:

talosctl --nodes 10.0.0.5 health --wait-timeout 10m

The node will automatically join the etcd cluster (for control plane nodes) or register with Kubernetes (for worker nodes).

Removing Nodes from a Cluster

Properly removing nodes ensures data integrity and prevents cluster disruption.

Cordon and drain the node (Worker nodes)

For worker nodes, first cordon and drain workloads:

kubectl cordon worker-1
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data

Remove node from etcd (Control plane nodes)

For control plane nodes, gracefully leave the etcd cluster:

talosctl --nodes 10.0.0.3 etcd leave

Verify the node has left:

talosctl --nodes 10.0.0.2 etcd members

Reset the node

Reset the node to wipe its configuration and data:

talosctl --nodes 10.0.0.3 reset --graceful --reboot

The reset command wipes all data on the node. Ensure you’ve backed up any important data.

Remove from Kubernetes

Delete the node from Kubernetes:

kubectl delete node worker-1

Updating Node Configuration

Talos supports multiple modes for applying configuration changes.

Auto Mode (Default)

Applies the configuration with automatic reboot if required:

talosctl apply-config --nodes 10.0.0.2 --file updated-config.yaml

No Reboot Mode

Applies changes without rebooting (only works for non-critical changes):

talosctl apply-config --nodes 10.0.0.2 \
  --file updated-config.yaml \
  --mode no-reboot

Try Mode

Tests configuration changes with automatic rollback:

talosctl apply-config --nodes 10.0.0.2 \
  --file updated-config.yaml \
  --mode try \
  --timeout 5m

If the node becomes unreachable within the timeout, it automatically reverts to the previous configuration.

Staged Mode

Stages the configuration to be applied on next reboot:

talosctl apply-config --nodes 10.0.0.2 \
  --file updated-config.yaml \
  --mode staged

Patching Configuration

For small changes, use configuration patches instead of applying full configs:

talosctl patch machineconfig --nodes 10.0.0.2 \
  --patch '[{"op": "add", "path": "/machine/time/servers", "value": ["time.cloudflare.com"]}]'

You can also use patch files:

talosctl patch machineconfig --nodes 10.0.0.2 \
  --patch @patch.yaml

Scaling Control Plane

When scaling the control plane, maintain an odd number of nodes (3, 5, 7) for etcd quorum.

Adding a Control Plane Node

Add the new control plane node

Apply the control plane configuration:

talosctl apply-config --insecure \
  --nodes 10.0.0.4 \
  --file controlplane.yaml

Verify etcd membership

Check that the node has joined etcd:

talosctl --nodes 10.0.0.2 etcd members

Example output:

NODE         ID               HOSTNAME        PEER URLS                     CLIENT URLS                   LEARNER
0.0.2     6457a4e8ecba5c61 controlplane-1  https://10.0.0.2:2380         https://10.0.0.2:2379         false
0.0.3     7d3c4c7e8f9a1b2c controlplane-2  https://10.0.0.3:2380         https://10.0.0.3:2379         false
0.0.4     8e4d5d8f9g0b2c3d controlplane-3  https://10.0.0.4:2380         https://10.0.0.4:2379         false

Removing a Control Plane Node

Never reduce the control plane below 3 nodes in production. Always maintain etcd quorum.

Follow the same process as removing regular nodes, ensuring the node gracefully leaves etcd first.

Checking Cluster Health

Regularly verify cluster health:

talosctl health \
  --control-plane-nodes 10.0.0.2,10.0.0.3,10.0.0.4 \
  --worker-nodes 10.0.0.5,10.0.0.6

This performs comprehensive health checks including:

Node readiness
etcd cluster health
Kubernetes API server availability
Control plane component status
Pod readiness

Managing Services

View running services on a node:

talosctl --nodes 10.0.0.2 services

Restart a specific service:

talosctl --nodes 10.0.0.2 service kubelet restart

Check service status:

talosctl --nodes 10.0.0.2 service kubelet status

Get Started

Architecture

Installation & Deployment

Configuration

Operations

Security

Adding Nodes to a Cluster

Removing Nodes from a Cluster

Updating Node Configuration

Auto Mode (Default)

No Reboot Mode

Try Mode

Staged Mode

Patching Configuration

Scaling Control Plane

Adding a Control Plane Node

Removing a Control Plane Node

Checking Cluster Health

Managing Services

Build docs developers (and LLMs) love

Get Started

Architecture

Installation & Deployment

Configuration

Operations

Security

​Adding Nodes to a Cluster

​Removing Nodes from a Cluster

​Updating Node Configuration

​Auto Mode (Default)

​No Reboot Mode

​Try Mode

​Staged Mode

​Patching Configuration

​Scaling Control Plane

​Adding a Control Plane Node

​Removing a Control Plane Node

​Checking Cluster Health

​Managing Services

Build docs developers (and LLMs) love

Adding Nodes to a Cluster

Removing Nodes from a Cluster

Updating Node Configuration

Auto Mode (Default)

No Reboot Mode

Try Mode

Staged Mode

Patching Configuration

Scaling Control Plane

Adding a Control Plane Node

Removing a Control Plane Node

Checking Cluster Health

Managing Services