Overview
Redis Operator implements a fencing-first failover strategy inspired by CloudNativePG. The core principle:Always fence the old primary before promoting a new one.
Fencing-First Failover Sequence
When the controller detects the primary is unreachable, it executes the following steps in strict order:Step 1: Detect Primary Failure
The controller polls each instance manager viaGET http://<pod-ip>:9121/v1/status at regular intervals (default: every 5 seconds).
Failure conditions:
- HTTP request timeout (default: 2 seconds)
- HTTP 5xx error
- Connection refused (pod not running)
- Pod marked for deletion (
metadata.deletionTimestampset)
internal/controller/cluster/status.go:12-14):
Connected field is false for two consecutive reconcile cycles, the controller initiates failover.
Step 2: Fence the Former Primary
Before promoting any replica, the controller sets a fencing annotation on theRedisCluster resource:
internal/instance-manager/reconciler/reconciler.go):
The instance manager watches for this annotation. When a pod’s name appears in redis.io/fencedInstances:
- The instance manager immediately stops
redis-server(sendsSIGTERM, waits for graceful shutdown) - The instance manager refuses to restart Redis until the annotation is cleared
- Kubernetes liveness probe fails → pod is marked unhealthy
- Kubernetes readiness probe fails → pod is removed from Service endpoints
internal/controller/cluster/fencing.go:30.
Step 3: Select a Replica to Promote
The controller selects the replica with the smallest replication lag to minimize data loss. Selection criteria (internal/controller/cluster/fencing.go):
- Reachable:
status.instancesStatus[podName].Connected == true - Is a replica:
status.instancesStatus[podName].Role == "slave" - Lowest lag: Smallest
status.instancesStatus[podName].ReplicaLagBytes - Stable ordinal: If multiple replicas have the same lag, prefer the lowest ordinal (e.g.,
example-1overexample-2)
example-1 for promotion.
Step 4: Promote the Selected Replica
The controller calls the instance manager HTTP API on the selected replica’s pod IP (not through a Service):internal/instance-manager/webserver/server.go):
- Executes
REPLICAOF NO ONEvia Redis connection - Waits for Redis to confirm promotion (
INFO replicationshowsrole:master) - Updates
redis.confto removereplicaofdirective - Returns HTTP 200 on success
internal/controller/cluster/fencing.go:52.
Step 5: Update Services and Status
The controller updates Kubernetes resources to reflect the new topology: Service selector update (internal/controller/cluster/services.go):
Step 6: Remove Fencing
The controller clears the fencing annotation:internal/instance-manager/run/run.go:63-66):
- The fenced pod (e.g.,
example-0) is no longer prevented from starting - The instance manager reads
status.currentPrimaryfrom theRedisClusterCR - Sees
status.currentPrimary == "example-1"(notexample-0) - Boot-time guard activates: starts Redis with
REPLICAOF <example-1-ip> 6379 - Redis performs partial resync (
PSYNC) or full sync (SYNC) as needed - Any writes the former primary accepted after failover are discarded
internal/controller/cluster/fencing.go:57-58.
Step 7: Reconfigure Other Replicas
The controller updates all remaining replicas to follow the new primary: For each replica (internal/controller/cluster/pods.go):
- Send
REPLICAOF <new-primary-ip> 6379via the instance manager HTTP API - Wait for
INFO replicationto showmaster_link_status:up - Update
status.instancesStatus[podName].MasterLinkStatus = "up"
Split-Brain Prevention
Redis Operator uses two layers of defense against split-brain scenarios.Layer 1: Fencing-First Failover
Prevents: A recovering former primary from continuing to accept writes during failover. How it works:- Controller sets fence annotation before promoting a replica
- Instance manager stops Redis on the fenced pod
- Pod is removed from Service endpoints → clients can’t reach it
- New primary is promoted
- Fence is cleared; former primary restarts as replica
- Scenario: Former primary recovers network connectivity after Step 2 (fencing) but before Step 4 (promotion)
- Outcome: The former primary is already fenced → Redis is stopped → no writes are accepted
- Result: No split-brain; new primary is promoted safely
Layer 2: Boot-Time Role Guard
Prevents: A pod from self-electing as primary on startup, regardless of local data state. How it works (internal/instance-manager/run/run.go:63-66):
On every cold start, before redis-server is launched:
Hard invariant: The split-brain guard in
internal/instance-manager/run/run.go must fire before redis-server starts. If POD_NAME != status.currentPrimary, always issue REPLICAOF first, regardless of local data.- Lost: Any writes the former primary accepted after network partition (before fencing)
- Preserved: All writes accepted by the new primary after promotion
- Philosophy: Matches CloudNativePG’s
pg_rewindbehavior — prefer consistency over preserving isolated writes
Layer 3: Runtime Primary Isolation Detection
Prevents: An isolated primary (can’t reach API server or peers) from continuing to accept writes. How it works (internal/instance-manager/webserver/server.go):
The liveness probe (GET /healthz) on primary pods includes additional checks:
- Kubernetes API reachability: Can the instance manager reach the Kubernetes API server?
- Peer reachability: Can the instance manager reach other instance manager pods?
- Liveness probe returns HTTP 503 (Service Unavailable)
- Kubernetes marks the pod as unhealthy
- After
livenessProbe.failureThresholdconsecutive failures, Kubernetes restarts the pod - On restart, the boot-time guard (Layer 2) ensures the pod starts as a replica
This protection is configurable via
spec.primaryIsolation.enabled (default: true). Disable it only in non-production environments.api/v1/rediscluster_types.go:260-275.
Failover Timeline Example
Real-world failover scenario:Configuration Options
Status Poll Interval
Controls how frequently the controller polls instance managers for status. Controller flag:- Shorter interval (e.g., 2s): Faster failure detection, higher API server load
- Longer interval (e.g., 10s): Slower failure detection, lower API server load
HTTP Timeout
Controls how long the controller waits for instance manager HTTP responses. Controller flag:- Shorter timeout (e.g., 1s): Faster failure detection, more false positives during pod startup
- Longer timeout (e.g., 5s): Slower failure detection, fewer false positives
Synchronous Replication
Require a minimum number of replicas to acknowledge writes before the primary confirms success. Spec configuration:api/v1/rediscluster_types.go:122-130.
Primary Isolation Detection
Enable runtime isolation checks in the primary’s liveness probe. Spec configuration:- If enabled and the primary can’t reach the API server and can’t reach any peer instance managers, the liveness probe fails
- After
livenessProbe.failureThresholdfailures (default: 3), Kubernetes restarts the pod - On restart, the boot-time guard ensures the pod starts as a replica
This is a defense-in-depth measure. Most split-brain scenarios are prevented by fencing-first failover (Layer 1) and the boot-time guard (Layer 2).
Manual Failover
You can trigger a manual failover by scaling the primary pod to zero or deleting it. Example:- The controller promotes a replica before deleting the old primary
- The old primary is deleted only after the new primary is confirmed healthy
- Clients experience zero write downtime (reads are always served by replicas)
Sentinel Mode Failover
In Sentinel mode (spec.mode: sentinel), Sentinel handles failover, not the operator.
Sentinel failover flow:
- Sentinels monitor the primary via
PING(default: every 1 second) - When quorum Sentinels agree the primary is down (default: 2 of 3), they elect a new primary
- Sentinels run
REPLICAOF NO ONEon the selected replica - Sentinels reconfigure other replicas to follow the new primary
- The operator detects the change via Sentinel status polling
- The operator updates
status.currentPrimaryand the-leaderService selector
Sentinel failover does not use fencing. Sentinel relies on quorum-based leader election to prevent split-brain.
Comparison with Other Operators
See Comparison with Other Redis Operators for how Redis Operator’s failover approach compares to alternatives like OpsTree Redis Operator. Key differentiators:- Fencing-first: Redis Operator fences the old primary before promoting a new one
- Boot-time guard: Ensures a restarting pod can never self-elect as primary
- Pod IP targeting: Failover commands target specific pods, not load-balanced Services
- Controlled switchover: Rolling updates promote a replica first, then delete the old primary (zero downtime)
Next Steps
- Architecture — Understand the split control/data plane
- Cluster Modes — Standalone vs Sentinel failover behavior
- Upgrades — Zero-downtime primary upgrades via rolling updates