The Redis Operator provides mechanisms for handling planned maintenance, including node maintenance windows and controlled cluster updates.
Node Maintenance Mode
When performing planned node maintenance (kernel upgrades, hardware replacement, etc.), use the nodeMaintenanceWindow feature to control how Redis pods behave during node drains.
Enabling Maintenance Mode
Set nodeMaintenanceWindow.inProgress to true in your cluster spec:
apiVersion: redis.io/v1
kind: RedisCluster
metadata:
name: my-cluster
spec:
instances: 3
nodeMaintenanceWindow:
inProgress: true
reusePVC: true
# ... other fields
Or patch an existing cluster:
kubectl patch rediscluster my-cluster --type merge -p '{
"spec": {
"nodeMaintenanceWindow": {
"inProgress": true,
"reusePVC": true
}
}
}'
How It Works
When maintenance mode is enabled:
-
PVC Reuse - If
reusePVC: true (default), PVCs remain attached even when pods are evicted. This preserves data and speeds up pod rescheduling on other nodes.
-
Graceful Draining - The operator detects pod evictions and allows Kubernetes to reschedule pods while maintaining cluster availability.
-
Replication Awareness - Primary pods are moved via controlled switchover before node drain completes.
-
Status Tracking - Cluster status condition
MaintenanceInProgress is set to True.
Maintenance mode does not prevent pods from being evicted. It changes how the operator responds to evictions during the maintenance window.
Maintenance Workflow
Enable maintenance mode
kubectl patch rediscluster my-cluster --type merge -p '{
"spec": {
"nodeMaintenanceWindow": {
"inProgress": true
}
}
}'
Verify the condition:kubectl get rediscluster my-cluster \
-o jsonpath='{.status.conditions[?(@.type=="MaintenanceInProgress")]}' | jq
Cordon node(s)
This prevents new pods from scheduling on the node. Drain node(s)
kubectl drain node-1 \
--ignore-daemonsets \
--delete-emptydir-data \
--grace-period=30
Pods are gracefully evicted and rescheduled on other nodes.Monitor pod rescheduling
kubectl get pods -l redis.io/cluster=my-cluster -o wide -w
Watch as pods are recreated on available nodes.Perform node maintenance
Complete your maintenance tasks (kernel upgrade, hardware replacement, etc.).
Uncordon node(s)
Node is now available for scheduling again. Disable maintenance mode
kubectl patch rediscluster my-cluster --type merge -p '{
"spec": {
"nodeMaintenanceWindow": {
"inProgress": false
}
}
}'
Cluster returns to normal operation.
PVC Reuse Behavior
The reusePVC field controls PVC handling during maintenance:
reusePVC: true (Default)
Behavior:
- PVCs remain bound to their original nodes
- When pods reschedule to different nodes, they detach and reattach PVCs
- Data is preserved across node moves
- Faster recovery (no data copy required)
Use when:
- Using cloud storage (EBS, PD, Azure Disk) that supports cross-node attachment
- Nodes are in the same availability zone
- You want zero data loss during maintenance
reusePVC: false
Behavior:
- PVCs are deleted when pods are evicted
- New PVCs are created when pods reschedule
- Data is lost unless you have backups
Use when:
- Using local storage that cannot move between nodes
- You want a clean slate after maintenance
- Data is ephemeral or can be restored from backup
Setting reusePVC: false causes data loss. Create backups before enabling maintenance mode if you need to preserve data.
Multi-Node Maintenance
When maintaining multiple nodes simultaneously:
Sequential Maintenance (Recommended)
Drain nodes one at a time:
# Node 1
kubectl drain node-1 --ignore-daemonsets --grace-period=30
# Wait for pods to reschedule and cluster to return to Healthy
kubectl get rediscluster my-cluster -o jsonpath='{.status.phase}'
# Node 2 (only after node-1 is done)
kubectl drain node-2 --ignore-daemonsets --grace-period=30
# Wait for Healthy...
# Node 3
kubectl drain node-3 --ignore-daemonsets --grace-period=30
Advantages:
- Maintains cluster availability
- Replicas remain available during drain
- Lower risk of data loss
Parallel Maintenance (Advanced)
Drain multiple nodes simultaneously:
kubectl drain node-1 node-2 --ignore-daemonsets --grace-period=30
Parallel draining can cause temporary unavailability if the primary and all replicas are on affected nodes. Only use for clusters with sufficient node distribution.
Requirements:
- At least N+2 nodes for N replicas
- Pod anti-affinity configured to spread pods across nodes
- PodDisruptionBudget properly configured
PodDisruptionBudget Integration
The operator creates a PodDisruptionBudget (PDB) by default to protect cluster availability during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-cluster
spec:
maxUnavailable: 1
selector:
matchLabels:
redis.io/cluster: my-cluster
redis.io/workload: data
Disable PDB (Not Recommended)
To allow faster draining at the cost of availability:
spec:
enablePodDisruptionBudget: false
Disabling PDB allows multiple pods to be evicted simultaneously, potentially causing cluster downtime.
Maintenance Best Practices
Pre-Maintenance Checklist
Create backups
kubectl apply -f - <<EOF
apiVersion: redis.io/v1
kind: RedisBackup
metadata:
name: pre-maintenance-$(date +%s)
spec:
clusterName: my-cluster
target: prefer-replica
method: rdb
destination:
s3:
bucket: redis-backups
path: maintenance/
region: us-east-1
EOF
Verify cluster health
kubectl get rediscluster my-cluster -o wide
Ensure phase is Healthy and all replicas are ready.Check node distribution
kubectl get pods -l redis.io/cluster=my-cluster -o wide
Confirm pods are spread across multiple nodes.Review PDB status
kubectl get pdb my-cluster -o yaml
Verify disruptionsAllowed is at least 1.
During Maintenance
- Monitor cluster phase:
kubectl get rediscluster my-cluster -o jsonpath='{.status.phase}' -w
- Watch pod events:
kubectl get events -n default --field-selector involvedObject.kind=Pod --sort-by=.lastTimestamp
- Check replication lag: Observe
redis_replication_lag_bytes metric in Grafana
- Validate primary location: Ensure primary is not on a node being drained
Post-Maintenance
Disable maintenance mode
kubectl patch rediscluster my-cluster --type merge -p '{
"spec": {
"nodeMaintenanceWindow": {
"inProgress": false
}
}
}'
Verify cluster health
kubectl get rediscluster my-cluster -o wide
kubectl exec my-cluster-0 -- redis-cli INFO replication
Check pod distribution
kubectl get pods -l redis.io/cluster=my-cluster -o wide
Pods may have moved to different nodes.Review events and metrics
kubectl get events -n default --sort-by=.lastTimestamp | tail -n 50
Look for any warnings or errors.
Handling Stuck Drains
Pod Won’t Evict
Cause: PDB blocking eviction or pod finalizers.
Solution:
# Check PDB status
kubectl get pdb my-cluster -o yaml
# Check pod for finalizers
kubectl get pod my-cluster-0 -o yaml | grep -A5 finalizers
# Force drain (use with caution)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data --force
PVC Not Detaching
Cause: Volume still attached to old node.
Solution:
# Check volume attachment
kubectl get volumeattachment | grep my-cluster
# Describe PVC
kubectl describe pvc data-my-cluster-0
# Wait for cloud provider to detach (can take 2-5 minutes)
# Or manually detach via cloud provider console
Emergency Maintenance (Unplanned)
For unplanned node failures without maintenance mode:
- Node becomes NotReady - Kubernetes marks pods as
Unknown after 5 minutes
- Operator detects loss - Status polling fails for affected pods
- Failover triggers - If primary is on failed node, operator promotes a replica
- Pod rescheduling - After 5-10 minutes, pods are rescheduled on healthy nodes
Unplanned failures have longer recovery times (5-10 minutes) compared to planned maintenance (30-60 seconds) due to Kubernetes timeout periods.
Scheduling Maintenance Windows
Use annotations to document maintenance windows:
metadata:
annotations:
redis.io/maintenance-window: "Sundays 02:00-04:00 UTC"
redis.io/last-maintenance: "2026-02-28T02:00:00Z"
redis.io/next-maintenance: "2026-03-07T02:00:00Z"
These annotations are informational and do not affect operator behavior. Use them for coordination and documentation.