Inspiration: CloudNativePG
Redis Operator is heavily inspired by CloudNativePG, the Cloud Native PostgreSQL operator. The core design philosophy borrows from CNPG’s approach to stateful workload management:- Safety first: Prevent split-brain and data loss through fencing and boot-time guards
- Direct lifecycle control: Manage Pods and PVCs directly instead of relying on StatefulSets
- Declarative reconciliation: Converge toward desired state, not imperative commands
- Operational observability: Rich per-instance status for debugging and automation
- Minimal RBAC: Instance managers run with read-only access to their own cluster CR
Just as CloudNativePG uses
pg_rewind to ensure a former primary unconditionally follows the new primary on recovery, Redis Operator uses boot-time REPLICAOF enforcement to prevent self-election.Core Principles
1. Fencing-First Failover
Problem: During failover, if the old primary recovers while a new primary is being promoted, both may accept writes (split-brain). Solution: Always fence the old primary before promoting a new one. Implementation:- Operator detects primary is unreachable (HTTP poll timeout/error)
- Fence the former primary — Set the fence annotation on
RedisClusterfor that pod - Select the replica with the smallest replication lag
- Issue
POST /v1/promoteto that replica’s pod IP - Instance manager runs
REPLICAOF NO ONE - Operator updates
-leaderService selector to the new primary - Operator updates
cluster.status.currentPrimary - Operator removes the fence annotation from the former primary
- Former primary pod restarts; instance manager detects it is no longer
currentPrimaryand starts as a replica
internal/controller/cluster/fencing.go:49-58.
2. Boot-Time Split-Brain Guard
The fencing-first sequence is the primary defense. The instance manager provides a second line of defense at startup. Boot-time role check (internal/instance-manager/run/run.go:63-66):
On every cold start, before redis-server is launched, the instance manager compares POD_NAME against cluster.status.currentPrimary:
- Match → Start as primary (no
replicaofdirective inredis.conf) - No match → Always start with
replicaof <currentPrimary-ip> 6379, regardless of any local data state
PSYNC) or full SYNC as needed. Any data the former primary wrote after the failover is discarded, matching CNPG’s pg_rewind behavior.
This ensures a recovering former primary can never self-elect: it unconditionally follows
status.currentPrimary on boot.3. Direct Pod/PVC Management
Why not StatefulSets? StatefulSets provide ordering guarantees and stable network identities, but impose constraints that conflict with Redis-specific operational requirements:| StatefulSet Constraint | Redis Operator Need |
|---|---|
| Updates pods in ascending order (0, 1, 2…) | Replicas must update before the primary |
Immutable volumeClaimTemplates | PVC resizing and replacement without cluster recreation |
| Generic lifecycle hooks | Redis-specific fencing, switchover, and promotion logic |
| No pod-specific configuration | Each pod needs distinct redis.conf (primary vs replica) |
- Replica-first rolling updates: Update replicas in reverse ordinal order (highest first), then promote a replica to primary and delete the old primary last
- Supervised primary updates: Pause before touching the primary, wait for explicit user approval via annotation
- Immediate PVC updates: Resize or replace PVCs without StatefulSet recreation
- Fencing: Stop specific pods on-demand by setting an annotation
internal/controller/cluster/pods.go and internal/controller/cluster/pvcs.go.
4. Pod-Precise Control Plane
Problem: Services load-balance traffic. Calling a Service endpoint to promote a replica might hit the wrong pod. Solution: The controller always calls instance manager HTTP endpoints via pod IP, never through a Service. Example (internal/controller/cluster/fencing.go):
- Deterministic operations: Promotion, backup, and status polling target the exact pod the controller intends
- No race conditions: Load balancers can’t route critical commands to the wrong instance
- Simpler debugging: Logs clearly show which pod received which command
Services (
-leader, -replica, -any) are still created for client application traffic, but the operator bypasses them for control-plane operations.5. Secrets as Projected Volumes
Why not environment variables?- Security: Environment variables are visible in pod specs, logs, and crash dumps
- Rotation: Kubernetes automatically updates projected volume content; env vars require pod restarts
- Multi-key secrets: TLS secrets contain both
tls.crtandtls.key; projected volumes support multiple files from one secret
- Secrets are mounted as projected volumes at
/projectedand/tls - Kubernetes syncs secret updates to the pod filesystem (within ~60 seconds)
- The instance manager reconciler watches for file changes
- Changes are applied live via
CONFIG SETorACL LOAD(no pod restart)
internal/controller/cluster/secrets.go:33-41 and internal/instance-manager/reconciler/reconciler.go.
6. Status as Source of Truth
Principle: Thestatus subresource is the only source of truth for runtime state. The spec declares desired state; the status reflects observed reality.
Per-pod status tracking (api/v1/rediscluster_types.go:319-322):
- Stable keys: Pod names are immutable; slice indexes shift during scaling
- Strategic merge patch safety: Kubernetes merges maps by key; slices can experience ordering bugs
- Direct access:
status.instancesStatus["redis-0"]is more explicit thanstatus.instancesStatus[0]
- Redis role (
masterorslave) - Connectivity status
- Replication offset and lag
- Connected replicas (primary only)
- Master link status (replicas only)
- Last seen timestamp
internal/controller/cluster/status.go.
7. Reconciliation Order Discipline
Hard invariant: Sub-steps inreconcile() execute in a fixed order. Do not reorder.
Why it matters:
- Secret resolution before pod creation: Pods must mount the latest secret versions
- Services before status polling: The
-leaderService must exist before clients connect - Status polling before pod reconciliation: Scaling/failover decisions depend on live instance state
- PVC reconciliation before pod reconciliation: Pods require PVCs to be ready
internal/controller/cluster/reconciler.go:7-17):
- Global resources (ServiceAccount, RBAC, ConfigMap, PDB)
- Secret resolution
- Services
- HTTP status poll
- Status update
- Reachability check
- PVC reconciliation
- Pod reconciliation
8. Errors vs. Requeues
Principle: Returnctrl.Result{RequeueAfter: ...} for expected-transient states; return an error only for unexpected failures.
Examples:
| Scenario | Return |
|---|---|
| Pod is still pending | ctrl.Result{RequeueAfter: 5*time.Second} |
| Secret not found (user will create it) | ctrl.Result{RequeueAfter: 10*time.Second} |
| HTTP status poll timeout (pod is starting) | ctrl.Result{RequeueAfter: 2*time.Second} |
| Failed to create Pod (API error) | error |
| Failed to update status subresource | error |
- Errors increment failure counters and trigger exponential backoff; use them for bugs or API failures
- Requeues are normal operational delays; use them for waiting on asynchronous state changes
internal/controller/cluster/reconciler.go.
Comparison with StatefulSet-Based Operators
See Comparison with Other Redis Operators for a detailed comparison with OpsTree Redis Operator and other alternatives.Hard Invariants
These rules are enforced by code review and must never be broken:- Context-first:
context.Contextis always the first argument on any function that does I/O, network calls, or Kubernetes API calls - No panics: Errors are returned, not panicked; use
errors.Is/errors.Asfor error matching - Pod IP targeting: Operator-to-pod communication always uses the pod IP directly, never a Service
- Boot-time guard: The split-brain guard in
internal/instance-manager/run/run.gomust fire beforeredis-serverstarts - Fence-first: Fencing annotation goes on before promoting a replica
- Status-only updates: Status is updated via
statussubresource only (separate from spec) - Map-based status: Per-pod state lives in a map keyed by pod name, never a slice
- Replica-first updates: Rolling updates always process replicas before the primary (highest ordinal first)
- Projected volumes only: Secrets are injected as projected volumes, never env vars
- No cluster mode (yet):
spec.mode: clusteris reserved and rejected by the webhook
AGENTS.md:16-25 for the complete list.