Multi-Cluster Allocation

Multi-cluster allocation enables you to allocate GameServers across multiple Kubernetes clusters, providing:

Geographic distribution - Place players closer to game servers
Capacity expansion - Allocate from multiple clusters when one is at capacity
High availability - Failover to other clusters if one becomes unavailable
Cloud provider diversity - Spread workload across multiple providers

Architecture Overview

Multi-cluster allocation uses:

GameServerAllocationPolicy - Defines target clusters and priorities
Allocation endpoint - gRPC service for remote allocation
Client certificates - Mutual TLS authentication between clusters

Allocation always starts in the local cluster. If no GameServers are available, the allocator tries remote clusters based on configured policies.

Setup Prerequisites

Multiple Kubernetes clusters

At least 2 Kubernetes clusters with Agones installed:

# Cluster A (primary)
kubectl config use-context cluster-a
helm install agones agones/agones -n agones-system --create-namespace

# Cluster B (secondary)
kubectl config use-context cluster-b
helm install agones agones/agones -n agones-system --create-namespace

Network connectivity

Ensure clusters can communicate:

Allocator service must be externally accessible (LoadBalancer or Ingress)
Firewall rules allow port 443 between clusters
DNS resolution between cluster endpoints

TLS certificates

Generate certificates for mutual TLS authentication (see below).

Certificate Setup

Multi-cluster allocation requires mutual TLS authentication.

Generate Certificates

Using OpenSSL
Using cert-manager

#!/bin/bash
# Generate CA
openssl req -x509 -newkey rsa:4096 -nodes \
  -keyout ca.key -out ca.crt \
  -days 3650 -subj "/CN=agones-allocator-ca"

# Generate client certificate for cluster A
openssl req -newkey rsa:4096 -nodes \
  -keyout cluster-a-client.key -out cluster-a-client.csr \
  -subj "/CN=cluster-a-client"

openssl x509 -req -in cluster-a-client.csr \
  -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out cluster-a-client.crt -days 3650

# Generate client certificate for cluster B
openssl req -newkey rsa:4096 -nodes \
  -keyout cluster-b-client.key -out cluster-b-client.csr \
  -subj "/CN=cluster-b-client"

openssl x509 -req -in cluster-b-client.csr \
  -CA ca.crt -CAkey ca.key -CAcreateserial \
  -out cluster-b-client.crt -days 3650

# Install cert-manager first
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Create CA issuer
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: agones-ca-issuer
spec:
  ca:
    secretName: agones-ca-secret
---
# Create client certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cluster-a-client
  namespace: default
spec:
  secretName: cluster-a-client-secret
  commonName: cluster-a-client
  issuerRef:
    name: agones-ca-issuer
    kind: ClusterIssuer
  usages:
    - client auth

Create Kubernetes Secrets

# In Cluster A - create secret for accessing Cluster B
kubectl create secret tls cluster-b-client-secret \
  --cert=cluster-a-client.crt \
  --key=cluster-a-client.key \
  --dry-run=client -o yaml | kubectl apply -f -

# In Cluster B - create secret for accessing Cluster A
kubectl create secret tls cluster-a-client-secret \
  --cert=cluster-b-client.crt \
  --key=cluster-b-client.key \
  --dry-run=client -o yaml | kubectl apply -f -

Expose Allocator Service

Make the allocator service accessible from other clusters:

LoadBalancer
Ingress with TLS Passthrough

# Get allocator service in each cluster
kubectl get service agones-allocator -n agones-system

# Patch to LoadBalancer type
kubectl patch service agones-allocator -n agones-system \
  -p '{"spec":{"type":"LoadBalancer"}}'

# Get external IP
kubectl get service agones-allocator -n agones-system \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: agones-allocator-ingress
  namespace: agones-system
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPCS"
spec:
  ingressClassName: nginx
  rules:
  - host: allocator.cluster-b.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: agones-allocator
            port:
              number: 443
  tls:
  - hosts:
    - allocator.cluster-b.example.com

Configure Allocation Policies

Create GameServerAllocationPolicy resources to define remote clusters.

Basic Policy Example

cluster-b-policy.yaml

apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: cluster-b-policy
  namespace: default
spec:
  priority: 1
  weight: 100
  connectionInfo:
    clusterName: cluster-b
    allocationEndpoints:
      - allocator.cluster-b.example.com:443
    secretName: cluster-b-client-secret
    namespace: default
    serverCa: LS0tLS1CRUdJTi...  # Base64 encoded CA certificate

Understanding policy fields

priority - Lower values tried first (0 = highest priority). Policies with same priority use weighted random selection
weight - Relative weight for random selection within same priority
clusterName - Logical name for the cluster
allocationEndpoints - List of allocator endpoints (tries each until success)
secretName - Kubernetes secret containing client certificate and key
namespace - Namespace to allocate GameServers from in remote cluster
serverCa - Base64-encoded CA certificate to verify remote server

Get Server CA Certificate

# From Cluster B, extract allocator TLS certificate
kubectl get secret allocator-tls -n agones-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d > allocator-tls.crt

# Extract CA from certificate chain (usually the last certificate)
openssl x509 -in allocator-tls.crt -text

# Base64 encode for policy
cat ca.crt | base64 -w 0

Apply Policy

# Apply in Cluster A to enable allocation from Cluster B
kubectl apply -f cluster-b-policy.yaml

# Verify policy
kubectl get gameserverallocationpolicy
kubectl describe gameserverallocationpolicy cluster-b-policy

Allocation Priority and Fallback

Priority-Based Allocation

Clusters are tried in priority order:

# Cluster B - priority 1 (tried first after local)
apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: cluster-b-primary
spec:
  priority: 1
  weight: 100
  connectionInfo:
    clusterName: cluster-b
    allocationEndpoints:
      - allocator.cluster-b.example.com:443
    secretName: cluster-b-secret
    namespace: default
---
# Cluster C - priority 2 (tried second)
apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: cluster-c-backup
spec:
  priority: 2
  weight: 100
  connectionInfo:
    clusterName: cluster-c
    allocationEndpoints:
      - allocator.cluster-c.example.com:443
    secretName: cluster-c-secret
    namespace: default

Weighted Distribution

Use weights to distribute load across clusters with same priority:

# US-East cluster - 70% of traffic
apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: us-east
spec:
  priority: 1
  weight: 70  # 70% of allocations
  connectionInfo:
    clusterName: us-east-cluster
    allocationEndpoints:
      - allocator.us-east.example.com:443
    secretName: us-east-secret
    namespace: default
---
# US-West cluster - 30% of traffic
apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: us-west
spec:
  priority: 1
  weight: 30  # 30% of allocations
  connectionInfo:
    clusterName: us-west-cluster
    allocationEndpoints:
      - allocator.us-west.example.com:443
    secretName: us-west-secret
    namespace: default

Weights are relative. In this example, US-East gets 70/(70+30) = 70% of allocation attempts to priority 1 clusters.

Making Allocation Requests

Allocations work the same way with multi-cluster:

apiVersion: allocation.agones.dev/v1
kind: GameServerAllocation
metadata:
  generateName: multi-cluster-allocation-
spec:
  # Local cluster tried first, then policies by priority
  required:
    matchLabels:
      game: my-game
      mode: battle-royale
  # Preferred criteria (optional)
  preferred:
    - matchLabels:
        agones.dev/fleet: preferred-fleet
  # Metadata to set on allocated GameServer
  metadata:
    labels:
      allocated-by: matchmaker
    annotations:
      player-id: "12345"

The allocation process:

Try local cluster first
If no GameServers available locally, try remote clusters by priority
Within same priority, select cluster using weighted random
Return first successfully allocated GameServer

Geographic-Based Allocation

Implement latency-based allocation by managing policies dynamically:

// Pseudo-code for geographic allocation
func AllocateNearestCluster(playerLocation Location) {
    // Determine nearest clusters
    nearestClusters := GetNearestClusters(playerLocation)
    
    // Create allocation request
    allocation := &allocationv1.GameServerAllocation{
        Spec: allocationv1.GameServerAllocationSpec{
            Required: metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "game": "my-game",
                    "region": nearestClusters[0].Region,
                },
            },
        },
    }
    
    // Submit to nearest cluster
    result := client.Allocate(nearestClusters[0].Name, allocation)
    return result
}

Monitoring Multi-Cluster Allocation

Allocation Metrics by Cluster

# Allocation attempts by cluster (via fleet_name label)
rate(agones_gameserver_allocations_duration_seconds_count[5m]) by (fleet_name)

# Success rate by cluster
sum(rate(agones_gameserver_allocations_duration_seconds_count{status="Allocated"}[5m])) by (fleet_name) /
sum(rate(agones_gameserver_allocations_duration_seconds_count[5m])) by (fleet_name)

# Remote allocation latency
histogram_quantile(0.99,
  sum(rate(agones_gameserver_allocations_duration_seconds_bucket[5m])) by (le, fleet_name)
)

Health Checks

# Test local allocator health
kubectl port-forward -n agones-system svc/agones-allocator 8443:443
curl -k https://localhost:8443/healthz

# Test remote connectivity from Cluster A to Cluster B
kubectl run -it --rm test-client --image=curlimages/curl --restart=Never \
  -- curl -k https://allocator.cluster-b.example.com/healthz

Troubleshooting

Certificate Issues

TLS handshake failures

# Check certificate validity
openssl x509 -in cluster-a-client.crt -text -noout | grep -A 2 Validity

# Verify certificate matches key
openssl x509 -noout -modulus -in cluster-a-client.crt | openssl md5
openssl rsa -noout -modulus -in cluster-a-client.key | openssl md5

# Test TLS connection
openssl s_client -connect allocator.cluster-b.example.com:443 \
  -cert cluster-a-client.crt -key cluster-a-client.key \
  -CAfile ca.crt

Certificate not trusted

Verify serverCa in policy matches actual server certificate:

# Get server CA from endpoint
echo | openssl s_client -connect allocator.cluster-b.example.com:443 2>/dev/null \
  | openssl x509 -outform PEM > server-ca.pem

# Base64 encode
cat server-ca.pem | base64 -w 0

# Update policy with correct serverCa
kubectl edit gameserverallocationpolicy cluster-b-policy

Allocation Failures

# Check policy status
kubectl get gameserverallocationpolicy -o yaml

# View allocator logs in remote cluster
kubectl logs -n agones-system -l app=agones,component=allocator --tail=100

# Check for allocation errors
kubectl get events --all-namespaces | grep -i allocation

# Test allocation to specific cluster
kubectl apply -f - <<EOF
apiVersion: allocation.agones.dev/v1
kind: GameServerAllocation
metadata:
  generateName: test-allocation-
spec:
  required:
    matchLabels:
      agones.dev/fleet: test-fleet
EOF

Network Connectivity

# Test DNS resolution
nslookup allocator.cluster-b.example.com

# Test network connectivity
telnet allocator.cluster-b.example.com 443

# Check firewall rules (GKE example)
gcloud compute firewall-rules list | grep 443

# Verify LoadBalancer is provisioned
kubectl get service agones-allocator -n agones-system

Best Practices

Use Priority for Fallback

Set primary cluster to priority 0, fallback clusters to higher priorities. This ensures local allocation is always tried first.

Secure Certificates

Use short-lived certificates (90 days)
Rotate certificates before expiration
Store private keys in secret management system
Use cert-manager for automated rotation

Monitor Remote Allocation

Track allocation latency and success rate per cluster to identify issues early.

Test Failover

Regularly test cluster failover by:

Scaling down Fleets in primary cluster
Simulating network partitions
Verifying allocation falls back correctly

Certificate rotation strategy

# Automate certificate rotation with cert-manager
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cluster-a-client
spec:
  secretName: cluster-a-client-secret
  renewBefore: 720h  # Renew 30 days before expiration
  issuerRef:
    name: agones-ca-issuer
    kind: ClusterIssuer

Multi-region deployment pattern

Deploy one cluster per major region (US-East, US-West, EU, Asia)
Use equal weights for clusters in same region
Set lower priority for cross-region fallback
Monitor cross-region allocation latency

Next Steps

Monitoring

Set up monitoring for multi-cluster metrics

Best Practices

Production deployment recommendations

Get Started

Core Concepts

Installation

Game Server Integration

Client SDKs

Operations

Advanced

Multi-Cluster Allocation

Architecture Overview

Setup Prerequisites

Certificate Setup

Generate Certificates

Create Kubernetes Secrets

Expose Allocator Service

Configure Allocation Policies

Basic Policy Example

Get Server CA Certificate

Apply Policy

Allocation Priority and Fallback

Priority-Based Allocation

Weighted Distribution

Making Allocation Requests

Geographic-Based Allocation

Monitoring Multi-Cluster Allocation

Allocation Metrics by Cluster

Health Checks

Troubleshooting

Certificate Issues

Allocation Failures

Network Connectivity

Best Practices

Use Priority for Fallback

Secure Certificates

Monitor Remote Allocation

Test Failover

Next Steps

Monitoring

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Installation

Game Server Integration

Client SDKs

Operations

Advanced

​Architecture Overview

​Setup Prerequisites

​Certificate Setup

​Generate Certificates

​Create Kubernetes Secrets

​Expose Allocator Service

​Configure Allocation Policies

​Basic Policy Example

​Get Server CA Certificate

​Apply Policy

​Allocation Priority and Fallback

​Priority-Based Allocation

​Weighted Distribution

​Making Allocation Requests

​Geographic-Based Allocation

​Monitoring Multi-Cluster Allocation

​Allocation Metrics by Cluster

​Health Checks

​Troubleshooting

​Certificate Issues

​Allocation Failures

​Network Connectivity

​Best Practices

Use Priority for Fallback

Secure Certificates

Monitor Remote Allocation

Test Failover

​Next Steps

Monitoring

Best Practices

Build docs developers (and LLMs) love

Architecture Overview

Setup Prerequisites

Certificate Setup

Generate Certificates

Create Kubernetes Secrets

Expose Allocator Service

Configure Allocation Policies

Basic Policy Example

Get Server CA Certificate

Apply Policy

Allocation Priority and Fallback

Priority-Based Allocation

Weighted Distribution

Making Allocation Requests

Geographic-Based Allocation

Monitoring Multi-Cluster Allocation

Allocation Metrics by Cluster

Health Checks

Troubleshooting

Certificate Issues

Allocation Failures

Network Connectivity

Best Practices

Next Steps