instances field in your RedisCluster spec.
Scaling Up (Adding Replicas)
Increase the number of instances to add replicas.Scale from 3 to 5 Instances
What Happens
Pods created
New pods
my-cluster-3 and my-cluster-4 are created with REPLICAOF configuration pointing to the current primary.Monitor Scaling Progress
Performance Impact
Adding replicas causes:- Network I/O spike - Full sync transfers entire dataset from primary to new replicas
- Disk I/O increase - Primary generates RDB snapshot for each new replica
- CPU usage - RDB generation and transfer processing
For large datasets (>10 GB), scale up during low-traffic periods. Full sync can take several minutes to hours depending on dataset size and network bandwidth.
Scaling Down (Removing Replicas)
Decrease the number of instances to remove replicas.Scale from 5 to 3 Instances
What Happens
Replica selection
Operator selects replicas to delete (highest ordinals first):
my-cluster-4, my-cluster-3.Primary protection
If the current primary is in the deletion set, operator performs a switchover to a remaining replica first.
PVC Retention on Scale Down
By default, PVCs are deleted when scaling down. To preserve PVCs:reclaimPolicy to Retain:
storageclass.yaml
Scaling to 1 Instance (Standalone)
You can scale down to a single instance:Scaling Sentinel Mode Clusters
For clusters usingmode: sentinel, sentinel pods are managed separately:
Scaling the
instances field only affects Redis data pods, not sentinel pods at internal/controller/cluster/sentinel.go:85.Capacity Planning
When to Scale Up
- High read load - Add replicas to distribute read traffic
- Replication lag increasing - More replicas means more replication load on primary; consider scaling primary resources instead
- Disaster recovery - More replicas improve availability during node failures
- Geographic distribution - Place replicas in multiple zones/regions
When to Scale Down
- Over-provisioned - Reduce cost by removing unused replicas
- Low traffic periods - Scale down during off-peak hours (if data loss risk is acceptable)
- Testing/development - Non-critical environments don’t need high replica counts
Recommended Instance Counts
| Environment | Instances | Notes |
|---|---|---|
| Development | 1-2 | Cost-effective, minimal redundancy |
| Staging | 2-3 | Mirrors production for testing |
| Production | 3-5 | High availability, read scaling |
| Mission-critical | 5+ | Maximum redundancy |
More replicas increase operational cost (compute, storage, network) and replication overhead on the primary. Find the balance between availability and cost.
Automatic Scaling
The operator does not include built-in HPA (Horizontal Pod Autoscaler) support for Redis clusters.Why HPA is Not Supported
- Stateful nature - Scaling Redis requires data replication, not just pod creation
- Primary constraints - Only one primary pod can accept writes
- Replication lag - Adding replicas causes load on primary, not relief
- PVC management - Automatic PVC creation/deletion requires careful orchestration
Workarounds for Auto-Scaling
Implement custom controllers that:- Monitor metrics (CPU, memory,
redis_connected_clients) - Patch
RedisClusterspec when thresholds are exceeded - Trigger scale-up during high load
- Scale down during low load with hysteresis to prevent flapping
scale-up.sh
Vertical Scaling (Resource Limits)
Scale compute resources (CPU, memory) by updating theresources field:
internal/controller/cluster/rolling_update.go:23.
Vertical scaling requires pod restarts. The operator performs rolling updates (replicas first, then primary) to maintain availability.
Storage Scaling (PVC Resize)
Increase storage size by updating thestorage.size field:
Requirements
- StorageClass must support volume expansion (
allowVolumeExpansion: true) - Underlying storage driver must support online resize
- New size must be larger than current size (shrinking is not supported)
What Happens
PVC resize triggered
Operator patches PVCs with new size:
kubectl patch pvc data-my-cluster-0 -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'Volume expansion
Kubernetes and storage driver expand the underlying volume. This may take several minutes depending on the storage backend.
Filesystem resize
For some storage types, pods may need to restart to resize the filesystem. The operator handles this automatically.
Best Practices
- Scale gradually - Add 1-2 replicas at a time, wait for sync to complete
- Monitor during scaling - Watch replication lag, network I/O, and primary CPU
- Backup before scaling down - Always create backups before removing instances
- Test scaling in staging - Verify scaling behavior matches expectations
- Use PDB - Keep
enablePodDisruptionBudget: trueto protect availability during scaling - Plan for growth - Provision storage with headroom for future expansion
- Avoid scale-down during high load - Only scale down during low-traffic periods