Overview
This document compares Redis Operator with other popular Kubernetes Redis operators, focusing on architectural differences, safety guarantees, and operational trade-offs. Primary comparison: OpsTree Redis Operator (OT-CONTAINER-KIT/redis-operator)
TL;DR: If you want Redis operations that are more strict, safety-first, and predictable during failures and upgrades, Redis Operator is the better fit.
Redis Operator vs. OpsTree Redis Operator
Comparison based on:- Redis Operator (HoWL):
howl-cloud/redis-operatorat6a2367d - OpsTree:
OT-CONTAINER-KIT/redis-operatorat7ae2c18
source/docs/comparison.md for the original analysis.
1. Safer Failover Flow
Redis Operator:- Fencing-first: Fence the old primary before promoting a replica
- Explicit sequence: Controller sets fence annotation → stops Redis on old primary → promotes replica → updates Services
- Pod IP targeting: Promotion command targets the exact replica pod via HTTP (
POST http://<pod-ip>:9121/v1/promote)
- Sentinel-driven: Relies on Redis Sentinel for automatic failover orchestration
- No explicit fencing: Sentinel uses quorum-based leader election; no equivalent controller-level fencing contract
- StatefulSet-centric: Failover is triggered by Sentinel, controller syncs Kubernetes resources afterward
Why this matters: Redis Operator puts a “stop writing” sign on the old leader before choosing a new one, reducing split-brain risk.
internal/controller/cluster/fencing.go:49-58 and Failover and Fencing.
2. Split-Brain Defense at Startup and Runtime
Redis Operator:- Boot-time guard (
internal/instance-manager/run/run.go:63-66): If a restarting pod is notstatus.currentPrimary, it is forced to start as a replica (REPLICAOF), regardless of local data - Runtime guard (
internal/instance-manager/webserver/server.go): A primary can fail liveness probes if it is isolated from both the API server and peers, so Kubernetes replaces it
- Health probes: Primarily
redis-cli pingstyle checks in StatefulSet configuration - No boot-time role guard: Relies on Sentinel to reconfigure roles after pod restart
Why this matters: Redis Operator is more aggressive about avoiding “isolated primary keeps accepting writes” scenarios.
3. More Predictable Primary Upgrades
Redis Operator:- Replica-first rolling updates: Update replicas first (highest ordinal), then promote a replica to primary and delete the old primary last
- Supervised mode:
spec.primaryUpdateStrategy: supervisedpauses before touching the primary and waits for explicit user approval via annotation - Zero-downtime switchover: Promote a replica → wait for confirmation → delete old primary
- StatefulSet update strategy: Relies on
RollingUpdatewithpartitionorOnDeletestrategies - No operator-level primary gate: Updates follow standard StatefulSet ordering (ascending by ordinal)
Why this matters: Primary change is the riskiest step, and Redis Operator lets you gate it on purpose.
api/v1/rediscluster_types.go:20-27 and Upgrades.
4. Pod-Precise Control Plane
Redis Operator:- Direct pod IP targeting: Controller calls instance manager HTTP endpoints (
/v1/status,/v1/promote,/v1/backup) via pod IP - Instance manager HTTP API: Exposes endpoints for promotion, backup, and status polling
- No load balancers: Critical operations bypass Services to target the exact pod
- Controller + StatefulSet + Redis/Sentinel command model: No equivalent in-pod instance-manager API endpoints
- Service-based routing: Operations typically use Services or direct
redis-clicommands
Why this matters: Redis Operator can reliably act on the exact pod you intended, avoiding load-balancer unpredictability.
5. Stronger Secret Hygiene by Default
Redis Operator:- Projected volumes only: Auth, ACL, and TLS secrets are mounted at
/projectedand/tlsas projected volumes - No env vars: Redis credentials are never injected via environment variables
- Automatic rotation: Kubernetes syncs secret updates to pods; instance manager applies changes live via
CONFIG SETorACL LOAD(no restart)
- Env var injection:
internal/k8sutils/statefulset.goincludesREDIS_PASSWORDviaSecretKeyRefenvironment variable - Manual rotation: Password changes may require pod restarts
internal/controller/cluster/pods.go and internal/controller/cluster/secrets.go:33-41.
6. Richer Per-Pod Status for Operations
Redis Operator:- Map-based status:
status.instancesStatusis a map keyed by pod name - Per-pod metrics: Role, connectivity, replication offset, lag, connected replicas, master link status, last seen timestamp
- Structured conditions: Kubernetes conditions for
Ready,PrimaryAvailable,ReplicationHealthy, etc.
- Coarse status: Key APIs report
masterNode, cluster state/reason, ready leader/follower counts - Less granular: No equivalent per-pod replication offset or lag tracking in status
Why this matters: Better incident debugging and more precise automation logic (e.g., selecting the replica with the smallest lag for promotion).
api/v1/rediscluster_types.go:319-322 and internal/controller/cluster/status.go.
7. Avoids StatefulSet Immutability Constraints
Redis Operator:- Direct Pod/PVC management: Pods and PVCs are created and managed directly by the controller
- No StatefulSet: Data pods (not Sentinel pods) are not managed by StatefulSets
- Flexible lifecycle: The operator can enforce Redis-specific ordering (replicas before primary) and immediate PVC updates
- StatefulSet-heavy lifecycle:
internal/k8sutils/statefulset.gowith explicit handling of immutablevolumeClaimTemplates - Recreate strategy: May require StatefulSet recreation for certain PVC changes
Why this matters: Redis Operator can enforce Redis-specific ordering and behavior directly, instead of fitting everything into generic StatefulSet mechanics.
internal/controller/cluster/pods.go, internal/controller/cluster/pvcs.go, and Design Principles.
Where OpsTree Is Strong
This comparison is not “OpsTree is bad.” OpsTree has clear strengths:1. Broad Mode Support
OpsTree supports multiple Redis topologies in a single operator:- Standalone (single instance)
- Replication (primary + replicas)
- Sentinel (high availability)
- Cluster (Redis Cluster sharding)
- Standalone (primary + replicas)
- Sentinel (high availability)
- Cluster (reserved, not yet implemented)
If you need Redis Cluster sharding today, OpsTree is the better choice.
2. Mature StatefulSet-Native Workflows
OpsTree follows standard Kubernetes patterns:- StatefulSets for pod management
- Familiar operational model:
kubectl scale,kubectl rollout status, etc. - Well-documented: Extensive Helm charts, examples, and runbooks
3. Strong Built-In Metrics Patterns
OpsTree includes:- redis-exporter integration out of the box
- Prometheus/Grafana documentation and dashboards
- Service monitors for automatic Prometheus scraping
Other Redis Operators
Spotahome Redis Operator
Repository:spotahome/redis-operator
Strengths:
- Mature and widely deployed
- Sentinel-based high availability
- Strong community support
- Primarily Sentinel-focused (limited standalone mode)
- No direct Pod/PVC management (uses StatefulSets)
- Fewer safety guarantees during failover (no explicit fencing)
Redis Enterprise Operator
Repository:RedisLabs/redis-enterprise-k8s-docs
Strengths:
- Commercial support from Redis Ltd.
- Advanced features (Active-Active geo-replication, Redis on Flash)
- High performance and scalability
- Proprietary: Requires Redis Enterprise license
- Complexity: Heavier operational overhead
- Cost: Not suitable for small teams or open-source projects
KubeDB Redis
Repository:kubedb/redis
Strengths:
- Part of the KubeDB ecosystem (unified operator for multiple databases)
- Supports Redis Cluster and Sentinel modes
- Built-in backup/restore integration
- Commercial: Requires KubeDB Enterprise license for advanced features
- Opinionated: Tightly coupled to KubeDB workflows
- Less Kubernetes-native: Custom CRDs and workflows differ from standard Kubernetes patterns
Decision Guide
Choose Redis Operator if:
- Safety is paramount: You need fencing-first failover and split-brain prevention
- Predictable upgrades: You want controlled primary updates with approval gates
- Direct control: You prefer pod-precise operations over generic StatefulSet workflows
- CloudNativePG inspiration: You value the CNPG design philosophy for stateful workloads
- Open source: You want a fully open-source solution with no license fees
Choose OpsTree if:
- Broad topology support: You need Redis Cluster sharding today
- StatefulSet-native: Your team is comfortable with standard Kubernetes patterns
- Mature ecosystem: You want extensive Helm charts, examples, and community support
- Metrics integration: You need built-in Prometheus/Grafana integration
Choose Spotahome if:
- Sentinel-focused: You primarily use Redis Sentinel for high availability
- Mature and stable: You prioritize a widely deployed, battle-tested operator
- Community support: You value a large user base and active community
Choose Redis Enterprise if:
- Commercial support: You need enterprise SLAs and support contracts
- Advanced features: You require Active-Active geo-replication or Redis on Flash
- Budget: You have budget for a commercial Redis solution
Choose KubeDB if:
- Multi-database: You run multiple databases (Postgres, MySQL, MongoDB) and want a unified operator
- Integrated backup: You need built-in backup/restore workflows
- Commercial support: You have budget for KubeDB Enterprise
Feature Comparison Matrix
| Feature | Redis Operator | OpsTree | Spotahome | Redis Enterprise | KubeDB |
|---|---|---|---|---|---|
| Open Source | ✅ | ✅ | ✅ | ❌ | ⚠️ (Enterprise features) |
| Standalone Mode | ✅ | ✅ | ⚠️ (limited) | ✅ | ✅ |
| Sentinel Mode | ✅ | ✅ | ✅ | ✅ | ✅ |
| Redis Cluster | ⏳ (reserved) | ✅ | ❌ | ✅ | ✅ |
| Fencing-First Failover | ✅ | ❌ | ❌ | ✅ | ❌ |
| Boot-Time Split-Brain Guard | ✅ | ❌ | ❌ | ✅ | ❌ |
| Supervised Primary Update | ✅ | ❌ | ❌ | ✅ | ❌ |
| Direct Pod/PVC Management | ✅ | ❌ | ❌ | ⚠️ (proprietary) | ❌ |
| Pod IP Targeting | ✅ | ❌ | ❌ | ⚠️ (proprietary) | ❌ |
| Projected Volume Secrets | ✅ | ⚠️ (env vars) | ⚠️ (env vars) | ✅ | ⚠️ (env vars) |
| Per-Pod Status Tracking | ✅ | ⚠️ (coarse) | ⚠️ (coarse) | ✅ | ⚠️ (coarse) |
| Built-In Metrics | ⚠️ (user-managed) | ✅ | ✅ | ✅ | ✅ |
| Backup/Restore | ✅ | ✅ | ⚠️ (limited) | ✅ | ✅ |
| Commercial Support | ❌ | ❌ | ❌ | ✅ | ✅ |
| Active-Active Geo-Replication | ⚠️ (replica mode) | ❌ | ❌ | ✅ | ❌ |
- ✅ Fully supported
- ⚠️ Partial support or requires manual configuration
- ❌ Not supported
- ⏳ Planned but not yet implemented
Philosophy Differences
| Aspect | Redis Operator | OpsTree |
|---|---|---|
| Inspiration | CloudNativePG (safety-first stateful workloads) | Standard Kubernetes patterns (StatefulSets) |
| Failover | Operator-managed fencing-first (standalone) or Sentinel | Sentinel-driven |
| Pod management | Direct Pod/PVC lifecycle control | StatefulSet-based |
| Primary updates | Replica-first with optional approval gate | StatefulSet rolling update |
| Split-brain prevention | Fencing + boot-time guard + runtime isolation checks | Sentinel quorum |
| Secret injection | Projected volumes only | Environment variables + projected volumes |
| Status tracking | Rich per-pod metrics (map-based) | Coarse cluster-level metrics |
| Operational model | Pod-precise HTTP API + Kubernetes API | Kubernetes API + Redis/Sentinel commands |
Migration Path
From OpsTree to Redis Operator
- Backup your data: Use OpsTree’s backup mechanism or create a manual RDB snapshot
- Create a Redis Operator cluster: Deploy a new
RedisClusterwithspec.bootstrap.backupNamepointing to the backup - Validate data: Connect to the new cluster and verify data integrity
- Update application endpoints: Point applications to the new cluster’s Services
- Monitor: Observe the new cluster under production load
- Decommission OpsTree cluster: Delete the old cluster once confident
From Redis Operator to OpsTree
Same process in reverse:- Backup via
RedisBackup - Deploy OpsTree cluster and restore from backup
- Update application endpoints
- Decommission Redis Operator cluster
Next Steps
- Architecture — Understand the split control/data plane
- Design Principles — Learn the CloudNativePG-inspired philosophy
- Failover and Fencing — Deep dive into split-brain prevention