High Availability

This guide explains how to configure vCluster for high availability (HA) to ensure your virtual clusters remain operational even when individual components fail.

High Availability Overview

A highly available vCluster deployment consists of:

Multiple control plane replicas (3 or more)
Distributed etcd cluster for data redundancy
Pod anti-affinity to spread replicas across nodes/zones
Resource guarantees to prevent eviction
Persistent storage with proper retention policies

Control Plane High Availability

Basic HA Configuration

Configure multiple control plane replicas with leader election:

controlPlane:
  statefulSet:
    highAvailability:
      # Number of replicas (must be odd for quorum)
      replicas: 3
      
      # Leader election settings
      leaseDuration: 60      # seconds
      renewDeadline: 40      # seconds  
      retryPeriod: 15        # seconds
    
    # Ensure replicas spread across nodes
    scheduling:
      podManagementPolicy: Parallel
      
      # Anti-affinity to avoid single point of failure
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - vcluster
                  - key: release
                    operator: In
                    values:
                      - my-vcluster
              topologyKey: kubernetes.io/hostname
      
      # Spread across availability zones
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: vcluster

Understanding Leader Election

When running multiple control plane replicas, vCluster uses leader election to ensure only one instance actively manages resources:

leaseDuration: How long a leader holds the lease (60s default)
renewDeadline: Time by which the leader must renew (40s default)
retryPeriod: How often non-leaders try to acquire lease (15s default)

The active leader performs all write operations while standby replicas remain ready to take over if the leader fails.

etcd High Availability

For production deployments, use a highly available etcd cluster as the backing store.

Deployed etcd HA Configuration

Enable Deployed etcd

Configure a 3-node etcd cluster:

controlPlane:
  backingStore:
    etcd:
      deploy:
        enabled: true
        
        statefulSet:
          # High availability configuration
          highAvailability:
            replicas: 3  # Must be odd number (3, 5, 7)
          
          image:
            registry: "registry.k8s.io"
            repository: "etcd"
            tag: "3.6.4-0"
          
          # Resource allocation
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 2Gi
          
          # Persistent storage
          persistence:
            volumeClaim:
              enabled: true
              size: 10Gi
              storageClass: "fast-ssd"  # Use fast storage
              retentionPolicy: Retain
              accessModes: ["ReadWriteOnce"]

Configure etcd Anti-Affinity

Ensure etcd pods run on different nodes:

controlPlane:
  backingStore:
    etcd:
      deploy:
        statefulSet:
          scheduling:
            podManagementPolicy: Parallel
            
            # Prevent co-location of etcd pods
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchExpressions:
                        - key: app
                          operator: In
                          values:
                            - vcluster-etcd
                        - key: release
                          operator: In
                          values:
                            - my-vcluster
                    topologyKey: kubernetes.io/hostname
            
            # Spread across zones
            topologySpreadConstraints:
              - maxSkew: 1
                topologyKey: topology.kubernetes.io/zone
                whenUnsatisfiable: DoNotSchedule
                labelSelector:
                  matchLabels:
                    app: vcluster-etcd

Verify etcd Cluster Health

Check that all etcd members are running:

# Check etcd pods
kubectl get pods -n vcluster-my-vcluster -l app=vcluster-etcd

# Expected output:
# NAME                READY   STATUS    RESTARTS   AGE
# my-vcluster-etcd-0   1/1     Running   0          5m
# my-vcluster-etcd-1   1/1     Running   0          5m
# my-vcluster-etcd-2   1/1     Running   0          5m

etcd Cluster Quorum

Understanding etcd quorum is critical for HA:

Cluster Size	Quorum Size	Fault Tolerance
1	1	0 nodes
3	2	1 node
5	3	2 nodes
7	4	3 nodes

Always use an odd number of etcd replicas. Even numbers don’t improve fault tolerance but increase the quorum requirement.

Example: With 3 etcd nodes:

Cluster remains operational if 1 node fails
Cluster becomes read-only if 2 nodes fail
Minimum 2 nodes needed for writes (quorum)

External etcd for Multi-Cluster HA

For even higher availability, use an external managed etcd cluster:

controlPlane:
  backingStore:
    etcd:
      external:
        enabled: true
        endpoint: "etcd-cluster.example.com:2379"
        
        # TLS configuration
        tls:
          caFile: "/etc/etcd/ca.crt"
          certFile: "/etc/etcd/client.crt"
          keyFile: "/etc/etcd/client.key"

This approach:

Separates etcd lifecycle from vCluster
Allows sharing etcd across multiple vClusters
Enables independent scaling and maintenance
Supports external backup/restore strategies

Storage Configuration for HA

Persistent Volume Claims

Ensure persistent storage survives pod restarts:

controlPlane:
  statefulSet:
    persistence:
      volumeClaim:
        enabled: true
        
        # Size based on expected cluster size
        size: 10Gi  # Increase for larger clusters
        
        # Use performant storage class
        storageClass: "premium-rwo"  # SSD-backed recommended
        
        # Retention policy
        retentionPolicy: Retain  # Keep PVCs after deletion
        
        # Access mode
        accessModes: ["ReadWriteOnce"]

Storage Class Selection

Choose appropriate storage class for HA:

# AWS Example - EBS gp3 with high IOPS
storageClass: "ebs-gp3"

# GCP Example - SSD persistent disks  
storageClass: "pd-ssd"

# Azure Example - Premium SSD
storageClass: "managed-premium"

# On-premises - Fast local SSD
storageClass: "local-ssd"

For etcd, always use SSD-backed storage. HDD storage can cause performance issues and cluster instability.

Resource Guarantees

Prevent pod eviction by setting appropriate resource requests:

controlPlane:
  statefulSet:
    # Higher priority prevents preemption
    scheduling:
      priorityClassName: "system-cluster-critical"
    
    # Resource guarantees
    resources:
      requests:
        cpu: 500m       # Guaranteed CPU
        memory: 1Gi     # Guaranteed memory
        ephemeral-storage: 2Gi
      limits:
        cpu: 2000m      # Max CPU burst
        memory: 4Gi     # Max memory
        ephemeral-storage: 10Gi

controlPlane:
  backingStore:
    etcd:
      deploy:
        statefulSet:
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 2Gi

Pod Disruption Budgets

Prevent too many pods from being disrupted simultaneously:

controlPlane:
  advanced:
    podDisruptionBudget:
      enabled: true
      # Custom PDB via annotations
      spec:
        minAvailable: 2  # At least 2 replicas must be available
        # OR
        # maxUnavailable: 1  # At most 1 replica can be unavailable

Pod Disruption Budgets protect against:

Node drains during maintenance
Cluster autoscaler scale-downs
Voluntary disruptions

Complete HA Configuration Example

Here’s a complete production-ready HA configuration:

values.yaml

# High Availability vCluster Configuration
controlPlane:
  # Control plane HA
  statefulSet:
    highAvailability:
      replicas: 3
      leaseDuration: 60
      renewDeadline: 40
      retryPeriod: 15
    
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
        ephemeral-storage: 2Gi
      limits:
        cpu: 2000m
        memory: 4Gi
        ephemeral-storage: 10Gi
    
    persistence:
      volumeClaim:
        enabled: true
        size: 10Gi
        storageClass: "fast-ssd"
        retentionPolicy: Retain
    
    scheduling:
      priorityClassName: "system-cluster-critical"
      podManagementPolicy: Parallel
      
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: vcluster
              topologyKey: kubernetes.io/hostname
      
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: vcluster
  
  # etcd HA
  backingStore:
    etcd:
      deploy:
        enabled: true
        statefulSet:
          highAvailability:
            replicas: 3
          
          image:
            registry: "registry.k8s.io"
            repository: "etcd"
            tag: "3.6.4-0"
          
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 2Gi
          
          persistence:
            volumeClaim:
              enabled: true
              size: 10Gi
              storageClass: "fast-ssd"
              retentionPolicy: Retain
          
          scheduling:
            podManagementPolicy: Parallel
            priorityClassName: "system-cluster-critical"
            
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  - labelSelector:
                      matchLabels:
                        app: vcluster-etcd
                    topologyKey: kubernetes.io/hostname
            
            topologySpreadConstraints:
              - maxSkew: 1
                topologyKey: topology.kubernetes.io/zone
                whenUnsatisfiable: DoNotSchedule
  
  # Pod Disruption Budget
  advanced:
    podDisruptionBudget:
      enabled: true
  
  # CoreDNS HA
  coredns:
    deployment:
      replicas: 2
      resources:
        requests:
          cpu: 20m
          memory: 64Mi
        limits:
          cpu: 1000m
          memory: 170Mi

# Resource policies
policies:
  resourceQuota:
    enabled: true
    quota:
      requests.cpu: 20
      requests.memory: 40Gi
      requests.storage: "200Gi"

Monitoring HA Health

Check Control Plane Status

# Check control plane pods
kubectl get pods -n vcluster-my-vcluster -l app=vcluster

# Check which pod is the leader
kubectl logs -n vcluster-my-vcluster my-vcluster-0 | grep "leader"

# Check lease information
kubectl get lease -n vcluster-my-vcluster

Check etcd Cluster Health

# Exec into etcd pod
kubectl exec -n vcluster-my-vcluster my-vcluster-etcd-0 -- sh -c \
  "ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/pki/ca.crt \
  --cert=/etc/etcd/pki/server.crt \
  --key=/etc/etcd/pki/server.key \
  endpoint health"

# Check etcd member list
kubectl exec -n vcluster-my-vcluster my-vcluster-etcd-0 -- sh -c \
  "ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/pki/ca.crt \
  --cert=/etc/etcd/pki/server.crt \
  --key=/etc/etcd/pki/server.key \
  member list"

Disaster Recovery

Backup Strategies

etcd Snapshots: Regular backups of etcd data:

# Create etcd snapshot
kubectl exec -n vcluster-my-vcluster my-vcluster-etcd-0 -- sh -c \
  "ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/pki/ca.crt \
  --cert=/etc/etcd/pki/server.crt \
  --key=/etc/etcd/pki/server.key \
  snapshot save /tmp/snapshot.db"

# Copy snapshot locally
kubectl cp vcluster-my-vcluster/my-vcluster-etcd-0:/tmp/snapshot.db ./snapshot.db

PVC Snapshots: Use volume snapshots if supported:

sync:
  toHost:
    volumeSnapshots:
      enabled: true
    volumeSnapshotContents:
      enabled: true

Testing HA Failover

Validate your HA setup by simulating failures:

# Delete one control plane pod
kubectl delete pod -n vcluster-my-vcluster my-vcluster-0

# Watch for automatic recovery
kubectl get pods -n vcluster-my-vcluster -w

# Verify cluster still works
vcluster connect my-vcluster -n vcluster-my-vcluster
kubectl get nodes

Common HA Pitfalls

Insufficient Node Resources

Problem: All replicas scheduled on same node due to resource constraints. Solution: Ensure cluster has adequate resources across multiple nodes/zones.

Wrong Storage Class

Problem: Using HDD or network storage for etcd causes performance issues. Solution: Always use SSD-backed storage for etcd.

Even Number of Replicas

Problem: Using 2 or 4 etcd replicas doesn’t improve fault tolerance. Solution: Use odd numbers (3, 5, 7) for proper quorum.

Missing Anti-Affinity

Problem: Multiple replicas on same node create single point of failure. Solution: Configure pod anti-affinity as shown in examples above.

Next Steps

Networking Configuration - Configure networking for HA setups
Storage Configuration - Advanced storage options
Configuration - Explore all configuration options

Get Started

Architecture

Deployment

Operations

Resource Syncing

Use Cases

Security

Integrations

High Availability