Maintenance Windows

The Redis Operator provides mechanisms for handling planned maintenance, including node maintenance windows and controlled cluster updates.

Node Maintenance Mode

When performing planned node maintenance (kernel upgrades, hardware replacement, etc.), use the nodeMaintenanceWindow feature to control how Redis pods behave during node drains.

Enabling Maintenance Mode

Set nodeMaintenanceWindow.inProgress to true in your cluster spec:

apiVersion: redis.io/v1
kind: RedisCluster
metadata:
  name: my-cluster
spec:
  instances: 3
  nodeMaintenanceWindow:
    inProgress: true
    reusePVC: true
  # ... other fields

Or patch an existing cluster:

kubectl patch rediscluster my-cluster --type merge -p '{
  "spec": {
    "nodeMaintenanceWindow": {
      "inProgress": true,
      "reusePVC": true
    }
  }
}'

How It Works

When maintenance mode is enabled:

PVC Reuse - If reusePVC: true (default), PVCs remain attached even when pods are evicted. This preserves data and speeds up pod rescheduling on other nodes.
Graceful Draining - The operator detects pod evictions and allows Kubernetes to reschedule pods while maintaining cluster availability.
Replication Awareness - Primary pods are moved via controlled switchover before node drain completes.
Status Tracking - Cluster status condition MaintenanceInProgress is set to True.

Maintenance mode does not prevent pods from being evicted. It changes how the operator responds to evictions during the maintenance window.

Maintenance Workflow

Enable maintenance mode

kubectl patch rediscluster my-cluster --type merge -p '{
  "spec": {
    "nodeMaintenanceWindow": {
      "inProgress": true
    }
  }
}'

Verify the condition:

kubectl get rediscluster my-cluster \
  -o jsonpath='{.status.conditions[?(@.type=="MaintenanceInProgress")]}' | jq

Cordon node(s)

kubectl cordon node-1

This prevents new pods from scheduling on the node.

Drain node(s)

kubectl drain node-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=30

Pods are gracefully evicted and rescheduled on other nodes.

Monitor pod rescheduling

kubectl get pods -l redis.io/cluster=my-cluster -o wide -w

Watch as pods are recreated on available nodes.

Perform node maintenance

Complete your maintenance tasks (kernel upgrade, hardware replacement, etc.).

Uncordon node(s)

kubectl uncordon node-1

Node is now available for scheduling again.

Disable maintenance mode

kubectl patch rediscluster my-cluster --type merge -p '{
  "spec": {
    "nodeMaintenanceWindow": {
      "inProgress": false
    }
  }
}'

Cluster returns to normal operation.

PVC Reuse Behavior

The reusePVC field controls PVC handling during maintenance:

reusePVC: true (Default)

Behavior:

PVCs remain bound to their original nodes
When pods reschedule to different nodes, they detach and reattach PVCs
Data is preserved across node moves
Faster recovery (no data copy required)

Use when:

Using cloud storage (EBS, PD, Azure Disk) that supports cross-node attachment
Nodes are in the same availability zone
You want zero data loss during maintenance

reusePVC: false

Behavior:

PVCs are deleted when pods are evicted
New PVCs are created when pods reschedule
Data is lost unless you have backups

Use when:

Using local storage that cannot move between nodes
You want a clean slate after maintenance
Data is ephemeral or can be restored from backup

Setting reusePVC: false causes data loss. Create backups before enabling maintenance mode if you need to preserve data.

Multi-Node Maintenance

When maintaining multiple nodes simultaneously:

Sequential Maintenance (Recommended)

Drain nodes one at a time:

# Node 1
kubectl drain node-1 --ignore-daemonsets --grace-period=30
# Wait for pods to reschedule and cluster to return to Healthy
kubectl get rediscluster my-cluster -o jsonpath='{.status.phase}'

# Node 2 (only after node-1 is done)
kubectl drain node-2 --ignore-daemonsets --grace-period=30
# Wait for Healthy...

# Node 3
kubectl drain node-3 --ignore-daemonsets --grace-period=30

Advantages:

Maintains cluster availability
Replicas remain available during drain
Lower risk of data loss

Parallel Maintenance (Advanced)

Drain multiple nodes simultaneously:

kubectl drain node-1 node-2 --ignore-daemonsets --grace-period=30

Parallel draining can cause temporary unavailability if the primary and all replicas are on affected nodes. Only use for clusters with sufficient node distribution.

Requirements:

At least N+2 nodes for N replicas
Pod anti-affinity configured to spread pods across nodes
PodDisruptionBudget properly configured

PodDisruptionBudget Integration

The operator creates a PodDisruptionBudget (PDB) by default to protect cluster availability during voluntary disruptions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-cluster
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      redis.io/cluster: my-cluster
      redis.io/workload: data

Disable PDB (Not Recommended)

To allow faster draining at the cost of availability:

spec:
  enablePodDisruptionBudget: false

Disabling PDB allows multiple pods to be evicted simultaneously, potentially causing cluster downtime.

Maintenance Best Practices

Pre-Maintenance Checklist

Create backups

kubectl apply -f - <<EOF
apiVersion: redis.io/v1
kind: RedisBackup
metadata:
  name: pre-maintenance-$(date +%s)
spec:
  clusterName: my-cluster
  target: prefer-replica
  method: rdb
  destination:
    s3:
      bucket: redis-backups
      path: maintenance/
      region: us-east-1
EOF

Verify cluster health

kubectl get rediscluster my-cluster -o wide

Ensure phase is Healthy and all replicas are ready.

Check node distribution

kubectl get pods -l redis.io/cluster=my-cluster -o wide

Confirm pods are spread across multiple nodes.

Review PDB status

kubectl get pdb my-cluster -o yaml

Verify disruptionsAllowed is at least 1.

During Maintenance

Monitor cluster phase: kubectl get rediscluster my-cluster -o jsonpath='{.status.phase}' -w
Watch pod events: kubectl get events -n default --field-selector involvedObject.kind=Pod --sort-by=.lastTimestamp
Check replication lag: Observe redis_replication_lag_bytes metric in Grafana
Validate primary location: Ensure primary is not on a node being drained

Post-Maintenance

Disable maintenance mode

kubectl patch rediscluster my-cluster --type merge -p '{
  "spec": {
    "nodeMaintenanceWindow": {
      "inProgress": false
    }
  }
}'

Verify cluster health

kubectl get rediscluster my-cluster -o wide
kubectl exec my-cluster-0 -- redis-cli INFO replication

Check pod distribution

kubectl get pods -l redis.io/cluster=my-cluster -o wide

Pods may have moved to different nodes.

Review events and metrics

kubectl get events -n default --sort-by=.lastTimestamp | tail -n 50

Look for any warnings or errors.

Handling Stuck Drains

Pod Won’t Evict

Cause: PDB blocking eviction or pod finalizers. Solution:

# Check PDB status
kubectl get pdb my-cluster -o yaml

# Check pod for finalizers
kubectl get pod my-cluster-0 -o yaml | grep -A5 finalizers

# Force drain (use with caution)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data --force

PVC Not Detaching

Cause: Volume still attached to old node. Solution:

# Check volume attachment
kubectl get volumeattachment | grep my-cluster

# Describe PVC
kubectl describe pvc data-my-cluster-0

# Wait for cloud provider to detach (can take 2-5 minutes)
# Or manually detach via cloud provider console

Emergency Maintenance (Unplanned)

For unplanned node failures without maintenance mode:

Node becomes NotReady - Kubernetes marks pods as Unknown after 5 minutes
Operator detects loss - Status polling fails for affected pods
Failover triggers - If primary is on failed node, operator promotes a replica
Pod rescheduling - After 5-10 minutes, pods are rescheduled on healthy nodes

Unplanned failures have longer recovery times (5-10 minutes) compared to planned maintenance (30-60 seconds) due to Kubernetes timeout periods.

Scheduling Maintenance Windows

Use annotations to document maintenance windows:

metadata:
  annotations:
    redis.io/maintenance-window: "Sundays 02:00-04:00 UTC"
    redis.io/last-maintenance: "2026-02-28T02:00:00Z"
    redis.io/next-maintenance: "2026-03-07T02:00:00Z"

These annotations are informational and do not affect operator behavior. Use them for coordination and documentation.

Get Started

Core Concepts

Configuration

Operations

Runbooks

Node Maintenance Mode

Enabling Maintenance Mode

How It Works

Maintenance Workflow

PVC Reuse Behavior

reusePVC: true (Default)

reusePVC: false

Multi-Node Maintenance

Sequential Maintenance (Recommended)

Parallel Maintenance (Advanced)

PodDisruptionBudget Integration

Disable PDB (Not Recommended)

Maintenance Best Practices

Pre-Maintenance Checklist

During Maintenance

Post-Maintenance

Handling Stuck Drains

Pod Won’t Evict

PVC Not Detaching

Emergency Maintenance (Unplanned)

Scheduling Maintenance Windows

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Operations

Runbooks

​Node Maintenance Mode

​Enabling Maintenance Mode

​How It Works

​Maintenance Workflow

​PVC Reuse Behavior

​reusePVC: true (Default)

​reusePVC: false

​Multi-Node Maintenance

​Sequential Maintenance (Recommended)

​Parallel Maintenance (Advanced)

​PodDisruptionBudget Integration

​Disable PDB (Not Recommended)

​Maintenance Best Practices

​Pre-Maintenance Checklist

​During Maintenance

​Post-Maintenance

​Handling Stuck Drains

​Pod Won’t Evict

​PVC Not Detaching

​Emergency Maintenance (Unplanned)

​Scheduling Maintenance Windows

Build docs developers (and LLMs) love

Node Maintenance Mode

Enabling Maintenance Mode

How It Works

Maintenance Workflow

PVC Reuse Behavior

reusePVC: true (Default)

reusePVC: false

Multi-Node Maintenance

Sequential Maintenance (Recommended)

Parallel Maintenance (Advanced)

PodDisruptionBudget Integration

Disable PDB (Not Recommended)

Maintenance Best Practices

Pre-Maintenance Checklist

During Maintenance

Post-Maintenance

Handling Stuck Drains

Pod Won’t Evict

PVC Not Detaching

Emergency Maintenance (Unplanned)

Scheduling Maintenance Windows