Total Cluster Loss Recovery

Severity: P1
Estimated time: 15-30 minutes Use this when all Redis data pods are gone but the PVCs still exist. The operator should recreate pods and reattach existing PVCs.

Symptoms

No data pods exist for the cluster.
PVCs named <cluster>-data-<index> still exist.
RedisCluster is degraded or stuck in non-healthy phase.

Prerequisites

Operator deployment is running.
PVCs for the cluster are intact.
Shell variables:

export NS=<rediscluster-namespace>
export CLUSTER=<rediscluster-name>

Diagnosis

Confirm data pods are missing

kubectl get pods -n "$NS" -l redis.io/cluster="$CLUSTER",redis.io/workload=data

Confirm PVCs are still present

kubectl get pvc -n "$NS" -l redis.io/cluster="$CLUSTER"

Check desired instance count and current primary

kubectl get rediscluster "$CLUSTER" -n "$NS" -o jsonpath='{.spec.instances}{"\n"}'
kubectl get rediscluster "$CLUSTER" -n "$NS" -o jsonpath='{.status.currentPrimary}{"\n"}'

Recovery Steps

Do not delete PVCs

Do NOT delete any PVCs. They contain your data and will be reattached automatically.

Trigger immediate reconciliation by touching an annotation

kubectl annotate rediscluster "$CLUSTER" -n "$NS" \
  runbooks.redis.io/recover-from-pod-loss-ts="$(date +%s)" --overwrite

Watch pod recreation

kubectl get pods -n "$NS" -l redis.io/cluster="$CLUSTER",redis.io/workload=data -w

If pods are not recreated within ~60 seconds, check operator logs/events

kubectl get events -n "$NS" \
  --field-selector involvedObject.kind=RedisCluster,involvedObject.name="$CLUSTER" \
  --sort-by=.lastTimestamp

Resolve operator health first if needed.

Verify recreated pods are bound to expected PVC names

kubectl get pod -n "$NS" -l redis.io/cluster="$CLUSTER",redis.io/workload=data \
  -o jsonpath='{range .items[*]}{.metadata.name}{" -> "}{.spec.volumes[?(@.name=="data")].persistentVolumeClaim.claimName}{"\n"}{end}'

Verification

kubectl get rediscluster "$CLUSTER" -n "$NS" -o jsonpath='{.status.currentPrimary}{"\n"}'
kubectl get rediscluster "$CLUSTER" -n "$NS" -o jsonpath='{.status.readyInstances}{"\n"}'
kubectl get rediscluster "$CLUSTER" -n "$NS" -o jsonpath='{.status.phase}{"\n"}'
kubectl get pvc -n "$NS" -l redis.io/cluster="$CLUSTER"

Expected:

Data pods are recreated to match spec.instances.
Pods are attached to existing <cluster>-data-<index> PVCs.
status.currentPrimary is set and reachable.
Cluster returns to Healthy.

Escalation

If PVCs are missing or failed to bind, escalate to storage/platform team.
If pods recreate but Redis cannot start from PVC data, move to PVC Corruption for per-replica recovery.
If no primary becomes available, follow Manual Failover.

Get Started

Core Concepts

Configuration

Operations

Runbooks

Total Cluster Loss Recovery

Symptoms

Prerequisites

Diagnosis

Recovery Steps

Verification

Escalation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Operations

Runbooks

​Symptoms

​Prerequisites

​Diagnosis

​Recovery Steps

​Verification

​Escalation

Build docs developers (and LLMs) love

Symptoms

Prerequisites

Diagnosis

Recovery Steps

Verification

Escalation