Overview
Redis Operator uses a split control/data plane architecture inspired by CloudNativePG. The operator consists of two roles that share a single OCI image:- Controller Manager — Runs in the control plane, reconciles
RedisClusterresources - Instance Manager — Runs as PID 1 inside each Redis pod, manages the local Redis process
Core Components
Controller Manager
The controller manager (redis-operator controller) is a standard Kubernetes operator that:
- Reconciles
RedisCluster,RedisBackup, andRedisScheduledBackupcustom resources - Creates and manages Pods, PVCs, Services, ConfigMaps, and RBAC resources
- Polls instance manager HTTP endpoints for live Redis status
- Orchestrates failover, rolling updates, and scaling operations
- Updates cluster status and Kubernetes conditions
- Global resources — ServiceAccount, RBAC, ConfigMap (redis.conf template), PodDisruptionBudget
- Secret resolution — Resolve all secrets referenced in
ClusterSpec, refreshstatus.secretsResourceVersion - Services — Ensure
-leader,-replica, and-anyServices exist with correct selectors - Status collection — HTTP-poll each instance manager pod for live Redis status
- Status update — Write collected data into
cluster.status - Reachability check — Requeue if any expected instance is unreachable
- PVC reconciliation — Create missing PVCs, track dangling/resizing/unusable PVCs
- Pod reconciliation — Create primary, join replicas, scale up/down, rolling updates
internal/controller/cluster/reconciler.go for the complete implementation.
Instance Manager
The instance manager (redis-operator instance) runs inside each Redis pod and:
- Initializes the Redis data directory (fresh start or restore from backup)
- Generates
redis.conffrom the cluster ConfigMap and pod-specific parameters - Starts and supervises
redis-serveras a child process - Exposes an HTTP API for Kubernetes probes and operator commands
- Runs an in-pod reconcile loop that watches the
RedisClusterCR for config changes - Enforces split-brain prevention at boot time
- Handles live configuration updates (ACL, password rotation) without restarts
GET /v1/status— Returns live Redis metrics (role, replication offset, connected replicas)POST /v1/promote— Promotes the instance to primary (REPLICAOF NO ONE)POST /v1/backup— Triggers an RDB/AOF backup to object storageGET /healthz— Kubernetes liveness probe with primary isolation checksGET /readyz— Kubernetes readiness probe
internal/instance-manager/run/run.go and internal/instance-manager/webserver/server.go.
The controller always targets specific pods by pod IP, not through Services. This ensures precise control for failover, promotion, and backup operations.
Communication Channels
Kubernetes API (Primary)
- Controller → Cluster: Watches and reconciles
RedisClusterresources - Instance Manager → Cluster: Watches
RedisClusterfor configuration changes, patchescluster.status.instancesStatus[<podName>]with live metrics - Shared state:
status.currentPrimaryis the source of truth for which pod should run as primary
HTTP API (Secondary)
The controller calls instance manager HTTP endpoints directly via pod IP for pod-specific operations:- Status polling:
GET http://<pod-ip>:9121/v1/status - Promotion:
POST http://<pod-ip>:9121/v1/promote - Backup:
POST http://<pod-ip>:9121/v1/backup
Pod Lifecycle Management
Each pod’s PVC is created separately and reused across pod restarts. When a pod spec needs updating:- The pod is deleted
- A new pod is created with the same PVC
- The instance manager detects the configuration change and regenerates
redis.conf - Redis starts with the updated configuration
- Full lifecycle control: The operator can enforce Redis-specific ordering (replicas before primary)
- Immediate updates: No StatefulSet immutability constraints (e.g.,
volumeClaimTemplates) - Precise failover: The operator controls exactly when and how each pod is replaced
internal/controller/cluster/pods.go and internal/controller/cluster/pvcs.go.
Service Topology
The operator creates three Kubernetes Services for eachRedisCluster:
| Service | Selector | Purpose |
|---|---|---|
<cluster>-leader | redis.io/role: primary | Routes to the current primary pod |
<cluster>-replica | redis.io/role: replica | Load-balances read traffic across replicas |
<cluster>-any | redis.io/cluster: <name> | Routes to any pod (primary or replica) |
<cluster>-leaderfor write operations<cluster>-replicafor read-only operations<cluster>-anyfor workloads that can tolerate reading from any instance
internal/controller/cluster/services.go.
Secret Injection
All secrets (authSecret, aclConfigSecret, tlsSecret, caSecret, backupCredentialsSecret) are injected as projected volumes, never as environment variables.
Projection paths inside pods:
/projected/password— Redis password (fromauthSecret)/projected/acl— ACL rules (fromaclConfigSecret)/tls/tls.crt,/tls/tls.key— TLS certificate and key (fromtlsSecret)/tls/ca.crt— CA certificate (fromcaSecret)
- User updates the referenced
Secretin Kubernetes - Kubernetes automatically updates the projected volume content inside running pods
- The instance manager reconciler detects the new file content
- The instance manager applies changes live via
CONFIG SETorACL LOAD(no pod restart)
The controller tracks each secret’s
ResourceVersion in status.secretsResourceVersion. When a secret changes, the controller enqueues the cluster for reconciliation.internal/controller/cluster/secrets.go and internal/instance-manager/reconciler/reconciler.go.
PodDisruptionBudget
The operator creates aPodDisruptionBudget for every RedisCluster where spec.enablePodDisruptionBudget is true (the default).
PDB configuration:
minAvailable = max(1, spec.instances - 1)- Allows at most one pod to be voluntarily disrupted at a time
- Updated automatically when
spec.instanceschanges
During rolling updates, the operator respects the PDB by updating replicas first (highest ordinal) and the primary last via switchover.
internal/controller/cluster/pdb.go.
Diagram
Key Design Decisions
| Decision | Rationale |
|---|---|
| No StatefulSets | Full lifecycle control, no immutability constraints, Redis-specific ordering |
| Split control/data plane | Pod-precise control, safer failover, live configuration updates |
| Pod IP targeting | Avoid load-balancer unpredictability for critical operations |
| Projected volume secrets | Enable secret rotation without pod restarts, better security posture |
| Direct Pod/PVC management | Enforce replica-first rolling updates, primary switchover, fencing |
| Map-based status | Avoid strategic-merge-patch ordering issues, stable per-pod tracking |