Skip to main content
CockroachDB supports rolling upgrades, allowing you to upgrade cluster versions without downtime. This guide covers upgrade procedures, version compatibility, and best practices.

Upgrade Overview

CockroachDB’s upgrade process involves:

Rolling Upgrade

Upgrade nodes one at a time while the cluster remains operational

Version Compatibility

Mixed-version clusters run temporarily during upgrades

Cluster Version

Internal cluster version finalized after all nodes upgraded

Zero Downtime

Applications continue running throughout the upgrade

Version Compatibility

CockroachDB maintains compatibility between consecutive versions:

Supported Upgrade Paths

  • Patch Upgrades: Any patch version to any higher patch within the same major.minor (e.g., v24.1.1 to v24.1.5)
  • Minor Upgrades: One minor version at a time (e.g., v24.1.x to v24.2.x)
  • Major Upgrades: Must upgrade through each major version (e.g., v23.2 → v24.1 → v24.2)
You cannot skip major or minor versions. Always upgrade sequentially through each version.

Version Skew Policy

During a rolling upgrade:
  • Nodes can run versions N and N+1 temporarily
  • The cluster operates in “mixed-version mode”
  • Some new features are unavailable until upgrade is finalized

Pre-Upgrade Checklist

Before starting an upgrade:
1

Review Release Notes

Read the release notes for the target version, paying attention to:
  • Breaking changes
  • Deprecated features
  • New cluster settings
  • Known issues
2

Check Version Compatibility

Verify your current version can upgrade directly to the target version.
SELECT version();
3

Backup the Cluster

Create a full backup before upgrading:
BACKUP INTO 's3://backup-bucket/pre-upgrade?AWS_ACCESS_KEY_ID=xxx&AWS_SECRET_ACCESS_KEY=xxx'
WITH revision_history;
Always backup before upgrading. This is your safety net if issues occur.
4

Test in Staging

Perform a complete upgrade test in a staging environment that mirrors production.
5

Review Cluster Health

-- Check for under-replicated ranges
SELECT count(*) FROM crdb_internal.ranges WHERE under_replicated = true;

-- Check for unavailable ranges  
SELECT count(*) FROM crdb_internal.ranges WHERE unavailable = true;

-- Verify all nodes are live
SELECT node_id, address, is_live FROM crdb_internal.gossip_nodes;
Resolve any issues before proceeding.
6

Plan Maintenance Window

While upgrades don’t require downtime, schedule during low-traffic periods for safety.

Rolling Upgrade Procedure

Manual Upgrade Process

1

Download New Binary

# Download the new version
wget https://binaries.cockroachdb.com/cockroach-v24.2.0.linux-amd64.tgz
tar xfz cockroach-v24.2.0.linux-amd64.tgz

# Verify the download
./cockroach version
2

Upgrade First Node

# Stop the node gracefully
kill -TERM $(cat cockroach-data/cockroach.pid)

# Wait for the node to stop completely
# Check logs: tail -f cockroach-data/logs/cockroach.log

# Replace the binary
cp cockroach /usr/local/bin/cockroach

# Start with the new version
cockroach start \
  --certs-dir=certs \
  --advertise-addr=<node-address> \
  --join=<join-addresses> \
  --background
3

Verify Node Rejoined

-- Check node status
SELECT node_id, address, build_tag, is_live 
FROM crdb_internal.gossip_nodes;
Ensure the upgraded node is live and running the new version.
4

Monitor Cluster Health

-- Verify no under-replicated ranges
SELECT count(*) FROM crdb_internal.ranges WHERE under_replicated = true;

-- Check for range issues
SELECT count(*) FROM crdb_internal.ranges WHERE unavailable = true;
Wait for the cluster to stabilize before continuing.
5

Repeat for Remaining Nodes

Upgrade each remaining node one at a time, following steps 2-4 for each node.
Allow each node to fully rejoin and stabilize before upgrading the next one.
6

Finalize the Upgrade

After all nodes are upgraded, finalize to enable new features:
SET CLUSTER SETTING version = '24.2';
Finalization is irreversible. You cannot downgrade after this step.
7

Verify Upgrade Completion

SHOW CLUSTER SETTING version;

Kubernetes Upgrade

For Kubernetes deployments using StatefulSets:
1

Update Container Image

kubectl set image statefulset/cockroachdb \
  cockroachdb=cockroachdb/cockroach:v24.2.0
2

Monitor Rolling Update

# Watch pod updates
kubectl rollout status statefulset/cockroachdb

# Check pod versions
kubectl get pods -l app=cockroachdb -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'
3

Verify Cluster Health

kubectl exec -it cockroachdb-0 -- \
  ./cockroach sql --certs-dir=certs --execute="
    SELECT node_id, address, build_tag, is_live 
    FROM crdb_internal.gossip_nodes;
  "
4

Finalize Upgrade

kubectl exec -it cockroachdb-0 -- \
  ./cockroach sql --certs-dir=certs --execute="
    SET CLUSTER SETTING version = '24.2';
  "

Upgrade Strategies

Conservative Approach

For mission-critical clusters:
1

Partial Upgrade

Upgrade one node and monitor for 24-48 hours before continuing.
2

Gradual Rollout

Upgrade remaining nodes over several days, one per day.
3

Delayed Finalization

Wait 1-2 weeks after all nodes are upgraded before finalizing.

Standard Approach

For typical production clusters:
1

Rolling Upgrade

Upgrade all nodes in a single maintenance window, one at a time.
2

Observation Period

Monitor for 24-72 hours in mixed-version mode.
3

Finalize

Complete the upgrade by finalizing the cluster version.

Monitoring During Upgrade

Key Metrics to Watch

  • Node Liveness: All nodes should remain live
  • Under-replicated Ranges: Should return to 0 after each node upgrade
  • Query Latency: Monitor for increases during upgrade
  • Error Rates: Watch for spikes in application errors
  • CPU/Memory: May increase temporarily during upgrades
  • Disk I/O: Rebalancing may increase I/O

Monitoring Commands

Check Node Versions
SELECT 
  node_id,
  address,
  build_tag,
  is_live,
  started_at
FROM crdb_internal.gossip_nodes
ORDER BY node_id;
Check Replica Health
SELECT 
  SUM(CASE WHEN under_replicated THEN 1 ELSE 0 END) AS under_replicated,
  SUM(CASE WHEN over_replicated THEN 1 ELSE 0 END) AS over_replicated,
  SUM(CASE WHEN unavailable THEN 1 ELSE 0 END) AS unavailable
FROM crdb_internal.ranges;
Monitor Running Jobs
SHOW JOBS 
WHERE status = 'running' 
ORDER BY created DESC;

Troubleshooting Upgrades

Node Won’t Start After Upgrade

1

Check Logs

tail -100 cockroach-data/logs/cockroach.log
Look for error messages indicating the issue.
2

Verify Binary

./cockroach version
Ensure the correct version is installed.
3

Check Disk Space

df -h /path/to/cockroach-data
4

Rollback if Needed

If the new version won’t start, revert to the previous binary:
cp /backup/cockroach.old /usr/local/bin/cockroach
cockroach start --certs-dir=certs --advertise-addr=<node-address> --join=<join-addresses> --background

Cluster Performance Degradation

Pause Upgrade

Stop upgrading additional nodes until performance stabilizes

Check Rebalancing

Reduce rebalancing rate if it’s impacting performance

Review Queries

Check for long-running queries or locks

Monitor Resources

Verify CPU, memory, and disk aren’t saturated
Reduce Rebalancing Rate
SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '32MiB';

Cannot Finalize Upgrade

Check Node Versions
SELECT node_id, build_tag 
FROM crdb_internal.gossip_nodes
WHERE is_live = true;
Ensure all live nodes are running the target version before finalizing.

Rollback Procedures

Before Finalization

You can rollback before finalizing the cluster version:
1

Stop Upgrading

Don’t upgrade any more nodes.
2

Downgrade Upgraded Nodes

Replace the new binary with the old version on upgraded nodes:
kill -TERM $(cat cockroach-data/cockroach.pid)
cp /backup/cockroach.old /usr/local/bin/cockroach
cockroach start --certs-dir=certs --advertise-addr=<node-address> --join=<join-addresses> --background
3

Verify Rollback

SELECT node_id, build_tag FROM crdb_internal.gossip_nodes;
After finalization, rollback is not possible. You can only move forward to newer versions.

Restore from Backup

If issues occur after finalization:
1

Deploy New Cluster

Create a fresh cluster running the previous version.
2

Restore Backup

RESTORE FROM LATEST IN 's3://backup-bucket/pre-upgrade?AWS_ACCESS_KEY_ID=xxx&AWS_SECRET_ACCESS_KEY=xxx';
3

Redirect Applications

Update application connection strings to the restored cluster.

Post-Upgrade Tasks

1

Update Statistics

-- Let automatic stats collection run, or manually trigger:
ANALYZE TABLE users;
2

Review New Features

Explore new cluster settings and features in the release notes.
3

Update Documentation

Document the new version in your infrastructure docs.
4

Remove Old Binaries

Clean up old binary backups after confirming stability.
5

Schedule Next Upgrade

Plan for the next upgrade based on release schedules.

Upgrade Best Practices

1

Stay Current

Upgrade regularly to avoid large version gaps and benefit from bug fixes.
2

Test First

Always test upgrades in staging before production.
3

Backup First

Never upgrade without a recent backup.
4

One Node at a Time

Never upgrade multiple nodes simultaneously.
5

Monitor Closely

Watch metrics continuously during and after upgrades.
6

Document Everything

Keep detailed notes of upgrade procedures and any issues.
7

Plan for Rollback

Always have a tested rollback plan before starting.

Automated Upgrade Considerations

While automation is possible, manual oversight during upgrades is recommended for production clusters:
  • Semi-automated: Automate binary deployment, but manually verify each step
  • Full automation: Only for non-critical environments or after extensive testing
  • Monitoring integration: Automated upgrades should pause on health check failures

Upgrade Planning Template

Upgrade Details:
  • Current Version: ____________
  • Target Version: ____________
  • Scheduled Date: ____________
  • Duration Estimate: ____________
Pre-Upgrade:
  • Release notes reviewed
  • Backup completed and verified
  • Staging upgrade completed
  • Stakeholders notified
  • Rollback plan documented
  • Monitoring dashboards ready
During Upgrade:
  • Node 1 upgraded and verified
  • Node 2 upgraded and verified
  • Node 3 upgraded and verified
  • Node N upgraded and verified
  • All nodes running target version
  • Cluster health verified
Post-Upgrade:
  • Upgrade finalized
  • Application testing completed
  • Performance validated
  • Documentation updated
  • Old binaries archived

Next Steps

Deployment

Review deployment options

Backup & Restore

Implement backup strategies

Scaling

Scale your cluster

Build docs developers (and LLMs) love