Replica mode configures a RedisCluster to act as a full-cluster replica of an external Redis primary. This enables disaster recovery (DR) topologies where a secondary cluster replicates all data from a primary cluster.
Overview
In replica mode:
All data pods replicate from an external Redis instance
The cluster has a designated leader (local primary candidate)
Replication can be promoted to make the cluster standalone
Use case : Multi-region DR
Region A (Production) Region B (DR)
┌─────────────────┐ ┌─────────────────┐
│ Primary Cluster │ │ Replica Cluster │
│ prod-redis │ │ dr-redis │
│ │ │ │
│ ┌─────┐ │ │ ┌─────┐ │
│ │ P │◄────────┼───────────┼─┤ L │ Leader │
│ └─────┘ │ Replicate │ └─────┘ │
│ ┌─────┐ │ │ ┌─────┐ │
│ │ R │◄────────┼───────────┼─┤ R │ │
│ └─────┘ │ │ └─────┘ │
└─────────────────┘ └─────────────────┘
On failover: promote=true → L becomes standalone primary
Configuration
Toggles external replication mode for all data pods. When true, all pods issue REPLICAOF <source.host> <source.port>.
Identifies the external Redis primary to replicate from. Show ReplicaSourceSpec Fields
External Redis endpoint (hostname or IP).
External Redis port. Validation : 1-65535
Human-readable identifier for the source cluster (used in monitoring/events).
Secret name containing key password for upstream authentication. Must exist in the same namespace as the RedisCluster.
Requests promotion of the local designated leader to standalone primary. When set to true:
Leader issues REPLICAOF NO ONE
Other pods reconfigured to replicate from leader
Cluster becomes standalone (replica mode disabled)
Basic Example
Primary Cluster (Region A)
apiVersion : redis.io/v1
kind : RedisCluster
metadata :
name : prod-redis
namespace : production
spec :
instances : 3
mode : sentinel
storage :
size : 100Gi
authSecret :
name : prod-redis-auth
DR Cluster (Region B)
apiVersion : redis.io/v1
kind : RedisCluster
metadata :
name : dr-redis
namespace : production
spec :
instances : 3
storage :
size : 100Gi
authSecret :
name : dr-redis-auth
# External replication from Region A
replicaMode :
enabled : true
source :
host : prod-redis-leader.production.svc.cluster.local # Or external IP
port : 6379
clusterName : prod-redis-us-east
authSecretName : prod-redis-auth # Must exist in this namespace
Notes :
authSecretName references the source cluster’s password
The secret must be copied to the DR cluster’s namespace:
kubectl get secret prod-redis-auth -n production -o yaml | \
sed 's/namespace: production/namespace: dr-cluster/' | \
kubectl apply -f -
Designated Leader
In replica mode, the operator selects a designated leader — the pod that will become primary on promotion.
Selection logic :
Pod with ordinal 0 (e.g., dr-redis-0)
Labeled with redis.io/role=primary (even though it’s a replica)
-leader service points to this pod
Why?
Stable endpoint for client preparation
Predictable promotion target
Consistent with standalone/sentinel mode
Set replicaMode.promote: true:
spec :
replicaMode :
enabled : true
source :
host : prod-redis-leader.production.svc.cluster.local
port : 6379
promote : true # Add this field
Apply:
kubectl apply -f dr-cluster.yaml
Operator Actions
Break replication : Leader issues REPLICAOF NO ONE
Reconfigure replicas : Other pods issue REPLICAOF <leader-ip> 6379
Disable replica mode : status.conditions updated to reflect standalone state
Update status : currentPrimary set to leader pod name
Status Condition
status :
conditions :
- type : ReplicaMode
status : "True"
reason : Enabled
message : "Cluster is replicating from prod-redis-us-east (prod-redis-leader.production.svc.cluster.local:6379)"
After promotion:
status :
conditions :
- type : ReplicaMode
status : "False"
reason : Promoted
message : "Cluster promoted to standalone (former source: prod-redis-us-east)"
Implementation Details
From api/v1/rediscluster_types.go:224-258:
type ReplicaModeSpec struct {
// Enabled toggles external replication mode for all data pods.
Enabled bool `json:"enabled,omitempty"`
// Source identifies the external Redis primary to replicate from.
Source * ReplicaSourceSpec `json:"source,omitempty"`
// Promote requests promotion of the local designated leader to standalone primary.
Promote bool `json:"promote,omitempty"`
}
type ReplicaSourceSpec struct {
// ClusterName is a human-readable source cluster identifier.
ClusterName string `json:"clusterName,omitempty"`
// Host is the external Redis endpoint.
Host string `json:"host"`
// Port is the external Redis port.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:default=6379
Port int32 `json:"port,omitempty"`
// AuthSecretName references a Secret with key "password" for upstream auth.
AuthSecretName string `json:"authSecretName,omitempty"`
}
Cross-Region Example
Setup
Region: us-east-1 (primary)
apiVersion : redis.io/v1
kind : RedisCluster
metadata :
name : redis-east
namespace : default
spec :
instances : 5
mode : sentinel
storage :
size : 200Gi
nodeSelector :
topology.kubernetes.io/region : us-east-1
authSecret :
name : redis-password
Expose via LoadBalancer:
apiVersion : v1
kind : Service
metadata :
name : redis-east-external
spec :
type : LoadBalancer
selector :
redis.io/cluster : redis-east
redis.io/role : primary
ports :
- port : 6379
targetPort : 6379
Get external IP:
kubectl get svc redis-east-external -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Output: 35.123.45.67
Region: us-west-2 (DR)
apiVersion : redis.io/v1
kind : RedisCluster
metadata :
name : redis-west
namespace : default
spec :
instances : 5
storage :
size : 200Gi
nodeSelector :
topology.kubernetes.io/region : us-west-2
authSecret :
name : redis-password # Same password as us-east
replicaMode :
enabled : true
source :
host : 35.123.45.67 # External IP from us-east
port : 6379
clusterName : redis-east
authSecretName : redis-password
Verify Replication
On DR cluster :
kubectl exec redis-west-0 -- redis-cli -a "$( kubectl get secret redis-password -o jsonpath='{.data.password}' | base64 -d )" INFO replication
# Output:
# role:slave
# master_host:35.123.45.67
# master_port:6379
# master_link_status:up
# master_sync_in_progress:0
Failover to DR
Scenario : us-east-1 region is down.
Promote DR cluster :
spec :
replicaMode :
enabled : true
source :
host : 35.123.45.67
port : 6379
promote : true # Trigger promotion
Apply :
kubectl apply -f redis-west.yaml
Verify promotion :
kubectl exec redis-west-0 -- redis-cli -a " $PASSWORD " INFO replication
# Output:
# role:master
# connected_slaves:4
Update application config to point to DR cluster:
env :
- name : REDIS_HOST
value : redis-west-leader.default.svc.cluster.local # Changed from redis-east
Recover Primary Region
When us-east-1 comes back online, reverse the replication:
apiVersion : redis.io/v1
kind : RedisCluster
metadata :
name : redis-east
namespace : default
spec :
instances : 5
storage :
size : 200Gi
authSecret :
name : redis-password
# Now replicate FROM us-west (DR)
replicaMode :
enabled : true
source :
host : <redis-west-external-ip>
port : 6379
clusterName : redis-west
authSecretName : redis-password
Monitoring
Replication Lag
Check lag on DR cluster:
kubectl exec redis-west-0 -- redis-cli -a " $PASSWORD " INFO replication | grep master_repl_offset
# master_repl_offset:123456789
# On primary:
kubectl exec redis-east-0 -- redis-cli -a " $PASSWORD " INFO replication | grep master_repl_offset
# master_repl_offset:123456800
# Lag: 123456800 - 123456789 = 11 bytes
Prometheus Metrics
Instance manager exports:
redis_replication_lag_bytes{cluster="redis-west"} - Replication lag in bytes
redis_master_link_up{cluster="redis-west"} - Master link status (1=up, 0=down)
Alert :
groups :
- name : redis.replication
rules :
- alert : RedisReplicationLagHigh
expr : redis_replication_lag_bytes > 10485760 # 10 MB
for : 5m
labels :
severity : warning
annotations :
summary : "Redis replication lag is {{ $value | humanize }}B"
- alert : RedisReplicationDown
expr : redis_master_link_up == 0
for : 1m
labels :
severity : critical
annotations :
summary : "Redis replication link is down for {{ $labels.pod }}"
Best Practices
Use stable endpoints for source.host
Don’t use pod IPs — they change on pod restart. Use:
Service DNS (for same cluster): prod-redis-leader.production.svc.cluster.local
LoadBalancer IP (for cross-cluster): 35.123.45.67
Ingress hostname (for cross-cluster with TLS): redis.us-east.example.com
Copy source auth secret to DR namespace
kubectl get secret prod-redis-auth -n production -o yaml | \
sed 's/namespace: production/namespace: dr/' | \
kubectl apply -f -
Or use ExternalSecret for automated sync.
Set minSyncReplicas on primary cluster
spec :
instances : 5
minSyncReplicas : 1 # Ensure 1 local replica ACKs writes
This prevents data loss if primary region fails immediately after a write.
Monitor replication lag
Set up alerts for lag > 10 MB or master link down.
Test failover regularly
Schedule DR drills:
Promote DR cluster
Run smoke tests
Demote back to replica mode
# Promote
kubectl patch rediscluster redis-west --type=merge -p '{"spec":{"replicaMode":{"promote":true}}}'
# Run tests
curl https://api.example.com/health
# Demote (re-enable replication)
kubectl patch rediscluster redis-west --type=merge -p '{"spec":{"replicaMode":{"promote":false}}}'
Use TLS for cross-region replication
Protect data in transit:
spec :
tlsSecret :
name : redis-tls
replicaMode :
enabled : true
source :
host : redis.us-east.example.com # TLS-enabled endpoint
port : 6379
Limitations
Promotion is manual — you must set promote: true. The operator does not auto-detect primary failure.
Workaround : Use external health checks and automation:
apiVersion : batch/v1
kind : CronJob
metadata :
name : dr-health-check
spec :
schedule : "*/1 * * * *" # Every minute
jobTemplate :
spec :
template :
spec :
containers :
- name : checker
image : redis:7.2
command :
- /bin/bash
- -c
- |
if ! redis-cli -h prod-redis-leader.production.svc.cluster.local PING; then
kubectl patch rediscluster dr-redis --type=merge -p '{"spec":{"replicaMode":{"promote":true}}}'
fi
No bidirectional replication
Replica mode is unidirectional : A → B. For bidirectional (multi-primary), use external tools like Redis Enterprise Active-Active.
Once promoted, you cannot simply set promote: false to revert. You must:
Reconfigure source cluster to replicate from DR
Re-enable replica mode on DR
Troubleshooting
Replication link down
Symptom :
kubectl exec redis-west-0 -- redis-cli INFO replication | grep master_link_status
# master_link_status:down
Causes :
Network unreachable : Check connectivity
kubectl exec redis-west-0 -- ping -c3 35.123.45.67
Wrong password : Verify authSecretName secret exists and matches source
kubectl get secret prod-redis-auth -o jsonpath='{.data.password}' | base64 -d
Firewall blocking : Check security groups/firewall rules
Fix : Update source configuration or network rules.
High replication lag
Symptom : Lag > 100 MB
Causes :
Slow network : Cross-region bandwidth limits
High write rate : Primary writes faster than replication can keep up
Disk bottleneck : DR cluster storage slower than primary
Debug :
# Check network throughput
kubectl exec redis-west-0 -- redis-cli INFO replication | grep -E "(master_repl_offset|slave_repl_offset)"
# Measure over time
watch -n1 'kubectl exec redis-west-0 -- redis-cli INFO replication | grep master_repl_offset'
Fix :
Increase network bandwidth (cross-region VPN/peering)
Scale up DR cluster storage IOPS
Reduce write rate on primary
Symptom : promote: true set but pods still replicate from source.
Debug :
kubectl describe rediscluster redis-west
# Check events for errors
kubectl logs -l app.kubernetes.io/name=redis-operator
# Look for promotion errors
Cause : Operator may be unable to connect to leader pod.
Fix : Verify leader pod is running:
kubectl get pods -l redis.io/cluster=redis-west,redis.io/role=primary