A multi-master Kubernetes cluster provides high availability by running multiple control plane nodes. This guide uses HAProxy for load balancing across master nodes.
Architecture Overview
This setup includes:
2 master nodes (control plane)
2 worker nodes
1 load balancer (HAProxy)
All nodes running Ubuntu 16.04+
Infrastructure Requirements
This example uses AWS EC2 instances:
Master nodes : 2 CPU, 2 GB RAM, 10 GB storage (x2)
Worker nodes : 1 CPU, 2 GB RAM, 10 GB storage (x2)
Load balancer : 1 CPU, 2 GB RAM, 10 GB storage (x1)
Initial Preparation
Note IP Addresses
Record the internal IP addresses of all machines: # On each machine
hostname -I
Set Hostnames
On each machine, set appropriate hostname: # On load balancer
hostnamectl set-hostname loadbalancer
# On masters
hostnamectl set-hostname manager1 # First master
hostnamectl set-hostname manager2 # Second master
# On workers
hostnamectl set-hostname worker1 # First worker
hostnamectl set-hostname worker2 # Second worker
Update /etc/hosts
On all machines, add hostname mappings: 127.0.0.1 localhost < hostnam e >
# Replace with actual IPs
< loadbalancer-ip > loadbalancer
< manager1-ip > manager1
< manager2-ip > manager2
< worker1-ip > worker1
< worker2-ip > worker2
Reboot and Verify
After reboot, verify hostname:
Run top after SSH to prevent connection timeout during setup.
Update System
sudo apt-get update && sudo apt-get upgrade -y
Install HAProxy
sudo apt-get install haproxy -y
Configure HAProxy
Edit the HAProxy configuration: vim /etc/haproxy/haproxy.cfg
Add the following at the end of the file: frontend fe-apiserver
bind 0.0.0.0:6443
mode tcp
option tcplog
default_backend be-apiserver
backend be-apiserver
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server manager1 <manager1-ip>:6443 check
server manager2 <manager2-ip>:6443 check
Replace <manager1-ip> and <manager2-ip> with actual IP addresses.
Restart and Enable HAProxy
systemctl restart haproxy
systemctl enable haproxy
systemctl status haproxy
Verify HAProxy
Test that HAProxy is listening on port 6443:
Install Kubernetes Components
Run this script on all nodes (both masters and workers), except the load balancer.
Create Installation Script
#!/bin/bash
echo "Disabling swap"
swapoff -a
sed -e '/swap/s/^/#/g' -i /etc/fstab
echo "Installing Kubernetes version 1.24.1-00"
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.24.1-00 kubeadm=1.24.1-00 kubectl=1.24.1-00 docker.io
apt-mark hold kubelet kubeadm kubectl
cat << EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
systemctl daemon-reload
systemctl restart docker
systemctl restart kubelet
echo "ALL DONE - OK"
Run Script
Make executable and run on each master and worker: chmod +x for-all-machines.sh
sudo ./for-all-machines.sh
Initialize First Master
Initialize Cluster
Replace <loadbalancer-internal-IP> with the actual IP: kubeadm init \
--control-plane-endpoint "<loadbalancer-internal-IP>:6443" \
--upload-certs \
--pod-network-cidr=192.168.0.0/16
Save Join Commands
The output will include two join commands:
For joining additional master nodes (with --control-plane flag)
For joining worker nodes
Save both commands in a file for later use.
Keep the join tokens secure. They provide access to join nodes to your cluster.
Join Second Master
Run Master Join Command
Use the join command for control plane from Step 3: kubeadm join loadbalancer:6443 --token < toke n > \
--discovery-token-ca-cert-hash sha256: < has h > \
--control-plane --certificate-key < certificate-ke y >
Join Worker Nodes
On each worker node:
SSH to Worker
ssh worker1 # Repeat for worker2
sudo -i
Run Worker Join Command
Use the worker join command from Step 3: kubeadm join loadbalancer:6443 --token < toke n > \
--discovery-token-ca-cert-hash sha256: < has h >
Copy Admin Config
Copy the contents of /etc/kubernetes/admin.conf from manager1 to $HOME/.kube/config on the load balancer. # On manager1
cat /etc/kubernetes/admin.conf
# On loadbalancer
vim $HOME /.kube/config
# Paste the contents
Install kubectl
snap install kubectl --classic
Check Cluster Status
Nodes will be NotReady until we install the CNI.
Install Calico Network Plugin
Install Tigera Operator
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
Download Custom Resources
curl https://projectcalico.docs.tigera.io/manifests/custom-resources.yaml -O
Apply Custom Resources
kubectl create -f custom-resources.yaml
Verify Nodes
Wait a few minutes, then check node status: All nodes should now be in Ready state.
Verify Cluster
Check Nodes
Check System Pods
Check Components
Test Deployment
kubectl get nodes -o wide
High Availability Testing
Deploy Test Application
kubectl create deployment ha-test --image=nginx --replicas=3
kubectl expose deployment ha-test --port=80 --type=NodePort
Shutdown Manager1
# On manager1
shutdown -h now
Verify Cluster Still Works
From the load balancer: kubectl get nodes
kubectl get pods
The cluster should still be functional with manager2 handling requests.
Restart Manager1
Power on manager1 and verify it rejoins the cluster:
Architecture Benefits
No single point of failure in control plane
Cluster remains operational if one master fails
HAProxy distributes API server load
Can add more master nodes for higher API throughput
Worker nodes can be added/removed without affecting control plane
Load balancer prevents overloading individual masters
Can upgrade one master at a time
Perform maintenance without cluster downtime
Rolling updates of control plane components
Troubleshooting
HAProxy Not Working
# Check HAProxy status
systemctl status haproxy
# Check HAProxy logs
journalctl -u haproxy -f
# Test connectivity to masters
telnet < manager1-i p > 6443
telnet < manager2-i p > 6443
Nodes Not Joining
# Check kubelet logs
journalctl -u kubelet -f
# Verify token hasn't expired
kubeadm token list
# Generate new token if needed
kubeadm token create --print-join-command
Network Issues
# Check Calico pods
kubectl get pods -n calico-system
# Describe Calico installation
kubectl describe installation default
# Check node connectivity
kubectl run test --image=busybox --rm -it -- ping < node-i p >
Best Practices
Use at least 3 master nodes for production (odd numbers for etcd quorum)
Place masters in different availability zones
Monitor HAProxy and set up health checks
Backup etcd regularly from all masters
Use persistent storage for etcd data
Implement proper RBAC and security policies
Keep all nodes on the same Kubernetes version
Use automation tools like Ansible for cluster setup
For production environments, consider using managed Kubernetes services (EKS, GKE, AKS) or tools like kops, kubeadm, or Rancher for more robust multi-master setups.