Multi-Master Cluster Setup

A multi-master Kubernetes cluster provides high availability by running multiple control plane nodes. This guide uses HAProxy for load balancing across master nodes.

Architecture Overview

This setup includes:

2 master nodes (control plane)
2 worker nodes
1 load balancer (HAProxy)
All nodes running Ubuntu 16.04+

Infrastructure Requirements

This example uses AWS EC2 instances:

Master nodes: 2 CPU, 2 GB RAM, 10 GB storage (x2)
Worker nodes: 1 CPU, 2 GB RAM, 10 GB storage (x2)
Load balancer: 1 CPU, 2 GB RAM, 10 GB storage (x1)

Initial Preparation

Note IP Addresses

Record the internal IP addresses of all machines:

# On each machine
hostname -I

Set Hostnames

On each machine, set appropriate hostname:

# On load balancer
hostnamectl set-hostname loadbalancer

# On masters
hostnamectl set-hostname manager1  # First master
hostnamectl set-hostname manager2  # Second master

# On workers
hostnamectl set-hostname worker1   # First worker
hostnamectl set-hostname worker2   # Second worker

Update /etc/hosts

On all machines, add hostname mappings:

/etc/hosts

127.0.0.1 localhost <hostname>

# Replace with actual IPs
<loadbalancer-ip> loadbalancer
<manager1-ip> manager1
<manager2-ip> manager2
<worker1-ip> worker1
<worker2-ip> worker2

Reboot and Verify

reboot

After reboot, verify hostname:

hostname

Run top after SSH to prevent connection timeout during setup.

Configure Load Balancer

Update System

sudo apt-get update && sudo apt-get upgrade -y

Install HAProxy

sudo apt-get install haproxy -y

Configure HAProxy

Edit the HAProxy configuration:

vim /etc/haproxy/haproxy.cfg

Add the following at the end of the file:

/etc/haproxy/haproxy.cfg

frontend fe-apiserver
    bind 0.0.0.0:6443
    mode tcp
    option tcplog
    default_backend be-apiserver

backend be-apiserver
    mode tcp
    option tcplog
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server manager1 <manager1-ip>:6443 check
    server manager2 <manager2-ip>:6443 check

Replace <manager1-ip> and <manager2-ip> with actual IP addresses.

Restart and Enable HAProxy

systemctl restart haproxy
systemctl enable haproxy
systemctl status haproxy

Verify HAProxy

Test that HAProxy is listening on port 6443:

nc -v localhost 6443

Install Kubernetes Components

Run this script on all nodes (both masters and workers), except the load balancer.

Create Installation Script

for-all-machines.sh

#!/bin/bash
echo "Disabling swap"
swapoff -a
sed -e '/swap/s/^/#/g' -i /etc/fstab

echo "Installing Kubernetes version 1.24.1-00"
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.24.1-00 kubeadm=1.24.1-00 kubectl=1.24.1-00 docker.io
apt-mark hold kubelet kubeadm kubectl

cat <<EOF | sudo tee /etc/docker/daemon.json
{
    "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

systemctl daemon-reload
systemctl restart docker
systemctl restart kubelet

echo "ALL DONE - OK"

Run Script

Make executable and run on each master and worker:

chmod +x for-all-machines.sh
sudo ./for-all-machines.sh

Initialize First Master

SSH to Manager1

ssh manager1
sudo -i

Initialize Cluster

Replace <loadbalancer-internal-IP> with the actual IP:

kubeadm init \
  --control-plane-endpoint "<loadbalancer-internal-IP>:6443" \
  --upload-certs \
  --pod-network-cidr=192.168.0.0/16

Save Join Commands

The output will include two join commands:

For joining additional master nodes (with --control-plane flag)
For joining worker nodes

Save both commands in a file for later use.

Keep the join tokens secure. They provide access to join nodes to your cluster.

Join Second Master

SSH to Manager2

ssh manager2
sudo -i

Run Master Join Command

Use the join command for control plane from Step 3:

kubeadm join loadbalancer:6443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane --certificate-key <certificate-key>

Join Worker Nodes

On each worker node:

SSH to Worker

ssh worker1  # Repeat for worker2
sudo -i

Run Worker Join Command

Use the worker join command from Step 3:

kubeadm join loadbalancer:6443 --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

Configure kubectl on Load Balancer

Create .kube Directory

mkdir -p $HOME/.kube

Copy Admin Config

Copy the contents of /etc/kubernetes/admin.conf from manager1 to $HOME/.kube/config on the load balancer.

# On manager1
cat /etc/kubernetes/admin.conf

# On loadbalancer
vim $HOME/.kube/config
# Paste the contents

Install kubectl

snap install kubectl --classic

Check Cluster Status

kubectl get nodes

Nodes will be NotReady until we install the CNI.

Install Calico Network Plugin

Install Tigera Operator

kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml

Download Custom Resources

curl https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml -O

Apply Custom Resources

kubectl create -f custom-resources.yaml

Verify Nodes

Wait a few minutes, then check node status:

kubectl get nodes

All nodes should now be in Ready state.

Verify Cluster

kubectl get nodes -o wide

High Availability Testing

Deploy Test Application

kubectl create deployment ha-test --image=nginx --replicas=3
kubectl expose deployment ha-test --port=80 --type=NodePort

Shutdown Manager1

# On manager1
shutdown -h now

Verify Cluster Still Works

From the load balancer:

kubectl get nodes
kubectl get pods

The cluster should still be functional with manager2 handling requests.

Restart Manager1

Power on manager1 and verify it rejoins the cluster:

kubectl get nodes

Architecture Benefits

High Availability

No single point of failure in control plane
Cluster remains operational if one master fails
HAProxy distributes API server load

Scalability

Can add more master nodes for higher API throughput
Worker nodes can be added/removed without affecting control plane
Load balancer prevents overloading individual masters

Maintenance

Can upgrade one master at a time
Perform maintenance without cluster downtime
Rolling updates of control plane components

Troubleshooting

HAProxy Not Working

# Check HAProxy status
systemctl status haproxy

# Check HAProxy logs
journalctl -u haproxy -f

# Test connectivity to masters
telnet <manager1-ip> 6443
telnet <manager2-ip> 6443

Nodes Not Joining

# Check kubelet logs
journalctl -u kubelet -f

# Verify token hasn't expired
kubeadm token list

# Generate new token if needed
kubeadm token create --print-join-command

Network Issues

# Check Calico pods
kubectl get pods -n calico-system

# Describe Calico installation
kubectl describe installation default

# Check node connectivity
kubectl run test --image=busybox --rm -it -- ping <node-ip>

Best Practices

Use at least 3 master nodes for production (odd numbers for etcd quorum)
Place masters in different availability zones
Monitor HAProxy and set up health checks
Backup etcd regularly from all masters
Use persistent storage for etcd data
Implement proper RBAC and security policies
Keep all nodes on the same Kubernetes version
Use automation tools like Ansible for cluster setup

For production environments, consider using managed Kubernetes services (EKS, GKE, AKS) or tools like kops, kubeadm, or Rancher for more robust multi-master setups.

Get Started

Core Concepts

Advanced Topics

Architecture Overview

Infrastructure Requirements

Initial Preparation

Configure Load Balancer

Install Kubernetes Components

Initialize First Master

Join Second Master

Join Worker Nodes

Configure kubectl on Load Balancer

Install Calico Network Plugin

Verify Cluster

High Availability Testing

Architecture Benefits

Troubleshooting

HAProxy Not Working

Nodes Not Joining

Network Issues

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Advanced Topics

​Architecture Overview

​Infrastructure Requirements

​Initial Preparation

​Configure Load Balancer

​Install Kubernetes Components

​Initialize First Master

​Join Second Master

​Join Worker Nodes

​Configure kubectl on Load Balancer

​Install Calico Network Plugin

​Verify Cluster

​High Availability Testing

​Architecture Benefits

​Troubleshooting

​HAProxy Not Working

​Nodes Not Joining

​Network Issues

​Best Practices

Build docs developers (and LLMs) love

Architecture Overview

Infrastructure Requirements

Initial Preparation

Configure Load Balancer

Install Kubernetes Components

Initialize First Master

Join Second Master

Join Worker Nodes

Configure kubectl on Load Balancer

Install Calico Network Plugin

Verify Cluster

High Availability Testing

Architecture Benefits

Troubleshooting

HAProxy Not Working

Nodes Not Joining

Network Issues

Best Practices