Skip to main content
These runbooks are for incident response and production operations of RedisCluster resources managed by this operator.

Runbook Index

ScenarioRunbook
Manual failover when operator is unavailableManual Failover
Reconciler loop appears stuckStuck Reconciler
All data pods lost, PVCs intactTotal Cluster Loss
Single corrupted replica PVCPVC Corruption
Suspected split-brain (two primaries)Split-Brain Recovery
Rotate authSecret without pod restartsSecret Rotation
Upgrade operator safelyOperator Upgrade

Shared Conventions

Set shell variables before executing commands:
  • NS=<rediscluster namespace>
  • CLUSTER=<rediscluster name>
  • Prefer touching annotations to force immediate reconciliation; the controller also reconciles periodically.
  • Validate the cluster has returned to status.phase=Healthy before closing the incident.

Build docs developers (and LLMs) love