Upgrade Overview
CockroachDB’s upgrade process involves:Rolling Upgrade
Upgrade nodes one at a time while the cluster remains operational
Version Compatibility
Mixed-version clusters run temporarily during upgrades
Cluster Version
Internal cluster version finalized after all nodes upgraded
Zero Downtime
Applications continue running throughout the upgrade
Version Compatibility
CockroachDB maintains compatibility between consecutive versions:Supported Upgrade Paths
- Patch Upgrades: Any patch version to any higher patch within the same major.minor (e.g., v24.1.1 to v24.1.5)
- Minor Upgrades: One minor version at a time (e.g., v24.1.x to v24.2.x)
- Major Upgrades: Must upgrade through each major version (e.g., v23.2 → v24.1 → v24.2)
Version Skew Policy
During a rolling upgrade:- Nodes can run versions N and N+1 temporarily
- The cluster operates in “mixed-version mode”
- Some new features are unavailable until upgrade is finalized
Pre-Upgrade Checklist
Before starting an upgrade:Review Release Notes
Read the release notes for the target version, paying attention to:
- Breaking changes
- Deprecated features
- New cluster settings
- Known issues
Rolling Upgrade Procedure
Manual Upgrade Process
Repeat for Remaining Nodes
Upgrade each remaining node one at a time, following steps 2-4 for each node.
Allow each node to fully rejoin and stabilize before upgrading the next one.
Kubernetes Upgrade
For Kubernetes deployments using StatefulSets:Upgrade Strategies
Conservative Approach
For mission-critical clusters:Standard Approach
For typical production clusters:Monitoring During Upgrade
Key Metrics to Watch
Critical Metrics
Critical Metrics
- Node Liveness: All nodes should remain live
- Under-replicated Ranges: Should return to 0 after each node upgrade
- Query Latency: Monitor for increases during upgrade
- Error Rates: Watch for spikes in application errors
- CPU/Memory: May increase temporarily during upgrades
- Disk I/O: Rebalancing may increase I/O
Monitoring Commands
Check Node Versions
Check Replica Health
Monitor Running Jobs
Troubleshooting Upgrades
Node Won’t Start After Upgrade
Cluster Performance Degradation
Pause Upgrade
Stop upgrading additional nodes until performance stabilizes
Check Rebalancing
Reduce rebalancing rate if it’s impacting performance
Review Queries
Check for long-running queries or locks
Monitor Resources
Verify CPU, memory, and disk aren’t saturated
Reduce Rebalancing Rate
Cannot Finalize Upgrade
Check Node Versions
Rollback Procedures
Before Finalization
You can rollback before finalizing the cluster version:Restore from Backup
If issues occur after finalization:Post-Upgrade Tasks
Upgrade Best Practices
Automated Upgrade Considerations
While automation is possible, manual oversight during upgrades is recommended for production clusters:
- Semi-automated: Automate binary deployment, but manually verify each step
- Full automation: Only for non-critical environments or after extensive testing
- Monitoring integration: Automated upgrades should pause on health check failures
Upgrade Planning Template
Upgrade Plan Template
Upgrade Plan Template
Upgrade Details:
- Current Version: ____________
- Target Version: ____________
- Scheduled Date: ____________
- Duration Estimate: ____________
- Release notes reviewed
- Backup completed and verified
- Staging upgrade completed
- Stakeholders notified
- Rollback plan documented
- Monitoring dashboards ready
- Node 1 upgraded and verified
- Node 2 upgraded and verified
- Node 3 upgraded and verified
- Node N upgraded and verified
- All nodes running target version
- Cluster health verified
- Upgrade finalized
- Application testing completed
- Performance validated
- Documentation updated
- Old binaries archived
Next Steps
Deployment
Review deployment options
Backup & Restore
Implement backup strategies
Scaling
Scale your cluster