Operator Upgrade

Severity: P1
Estimated time: 20-40 minutes This runbook operationalizes the upgrade story from issue #7 and points to the full design/procedure in the upgrade documentation.

Symptoms

Planned operator version upgrade.
Need to patch operator image without disrupting existing RedisCluster workloads.

Prerequisites

Helm access to the operator release.
Ability to inspect operator and RedisCluster resources.
Shell variables:

export OP_NS=<operator-namespace>
export RELEASE=<helm-release-name>
export CLUSTER_NS=<rediscluster-namespace>
export CLUSTER=<rediscluster-name>

Diagnosis

Capture pre-upgrade state

kubectl get redisclusters.redis.io -A
kubectl get pods -n "$CLUSTER_NS" -l redis.io/cluster="$CLUSTER" -o wide
kubectl get svc -n "$CLUSTER_NS" "$CLUSTER-leader" "$CLUSTER-replica" "$CLUSTER-any"
kubectl get events -n "$CLUSTER_NS" --sort-by=.lastTimestamp | tail -n 40
kubectl get rediscluster "$CLUSTER" -n "$CLUSTER_NS" -o jsonpath='{.status.currentPrimary}{"\n"}'

Confirm operator deployment identity

kubectl get deploy -n "$OP_NS" -l app.kubernetes.io/name=redis-operator
export OP_DEPLOY="$(kubectl get deploy -n "$OP_NS" -l app.kubernetes.io/name=redis-operator -o jsonpath='{.items[0].metadata.name}')"
echo "$OP_DEPLOY"

Recovery Steps

Follow the upgrade documentation as the source-of-truth procedure.

Run the Helm upgrade

helm upgrade "$RELEASE" charts/redis-operator \
  --namespace "$OP_NS" \
  --reuse-values \
  --set image.repository=<repo> \
  --set image.tag=<new-tag>

Watch operator rollout and leader lease

kubectl rollout status deployment/"$OP_DEPLOY" -n "$OP_NS"
kubectl get lease redis-operator-leader -n "$OP_NS"
kubectl logs -n "$OP_NS" deploy/"$OP_DEPLOY" --tail=200

Watch Redis cluster continuity during rollout

kubectl get rediscluster "$CLUSTER" -n "$CLUSTER_NS" -o yaml
kubectl get pods -n "$CLUSTER_NS" -l redis.io/cluster="$CLUSTER" -w

Verification

kubectl get rediscluster "$CLUSTER" -n "$CLUSTER_NS" -o jsonpath='{.status.phase}{"\n"}'
kubectl get rediscluster "$CLUSTER" -n "$CLUSTER_NS" -o jsonpath='{.status.currentPrimary}{"\n"}'
kubectl get pods -n "$CLUSTER_NS" -l redis.io/cluster="$CLUSTER"

Expected:

Operator rollout completes successfully with leader election continuity.
Existing Redis clusters remain available.
Data pods roll in controlled order only when spec hash is outdated (replicas first, primary last).
No bulk restart of all healthy data pods.
Cluster returns to Healthy.

Escalation

If availability degrades during upgrade, pause and assess rollback.

Rollback command:

helm rollback "$RELEASE" <previous-revision> --namespace "$OP_NS"

Capture logs/events and compare against upgrade documentation expectations and issue #7 (.issues/007_operator_upgrade_story.md).

Get Started

Core Concepts

Configuration

Operations

Runbooks

Symptoms

Prerequisites

Diagnosis

Recovery Steps

Verification

Escalation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Operations

Runbooks

​Symptoms

​Prerequisites

​Diagnosis

​Recovery Steps

​Verification

​Escalation

Build docs developers (and LLMs) love

Symptoms

Prerequisites

Diagnosis

Recovery Steps

Verification

Escalation