Upgrading CockroachDB

CockroachDB supports rolling upgrades, allowing you to upgrade cluster versions without downtime. This guide covers upgrade procedures, version compatibility, and best practices.

Upgrade Overview

CockroachDB’s upgrade process involves:

Rolling Upgrade

Upgrade nodes one at a time while the cluster remains operational

Version Compatibility

Mixed-version clusters run temporarily during upgrades

Cluster Version

Internal cluster version finalized after all nodes upgraded

Zero Downtime

Applications continue running throughout the upgrade

Version Compatibility

CockroachDB maintains compatibility between consecutive versions:

Supported Upgrade Paths

Patch Upgrades: Any patch version to any higher patch within the same major.minor (e.g., v24.1.1 to v24.1.5)
Minor Upgrades: One minor version at a time (e.g., v24.1.x to v24.2.x)
Major Upgrades: Must upgrade through each major version (e.g., v23.2 → v24.1 → v24.2)

You cannot skip major or minor versions. Always upgrade sequentially through each version.

Version Skew Policy

During a rolling upgrade:

Nodes can run versions N and N+1 temporarily
The cluster operates in “mixed-version mode”
Some new features are unavailable until upgrade is finalized

Pre-Upgrade Checklist

Before starting an upgrade:

Review Release Notes

Read the release notes for the target version, paying attention to:

Breaking changes
Deprecated features
New cluster settings
Known issues

Check Version Compatibility

Verify your current version can upgrade directly to the target version.

SELECT version();

Backup the Cluster

Create a full backup before upgrading:

BACKUP INTO 's3://backup-bucket/pre-upgrade?AWS_ACCESS_KEY_ID=xxx&AWS_SECRET_ACCESS_KEY=xxx'
WITH revision_history;

Always backup before upgrading. This is your safety net if issues occur.

Test in Staging

Perform a complete upgrade test in a staging environment that mirrors production.

Review Cluster Health

-- Check for under-replicated ranges
SELECT count(*) FROM crdb_internal.ranges WHERE under_replicated = true;

-- Check for unavailable ranges  
SELECT count(*) FROM crdb_internal.ranges WHERE unavailable = true;

-- Verify all nodes are live
SELECT node_id, address, is_live FROM crdb_internal.gossip_nodes;

Resolve any issues before proceeding.

Plan Maintenance Window

While upgrades don’t require downtime, schedule during low-traffic periods for safety.

Rolling Upgrade Procedure

Manual Upgrade Process

Download New Binary

# Download the new version
wget https://binaries.cockroachdb.com/cockroach-v24.2.0.linux-amd64.tgz
tar xfz cockroach-v24.2.0.linux-amd64.tgz

# Verify the download
./cockroach version

Upgrade First Node

# Stop the node gracefully
kill -TERM $(cat cockroach-data/cockroach.pid)

# Wait for the node to stop completely
# Check logs: tail -f cockroach-data/logs/cockroach.log

# Replace the binary
cp cockroach /usr/local/bin/cockroach

# Start with the new version
cockroach start \
  --certs-dir=certs \
  --advertise-addr=<node-address> \
  --join=<join-addresses> \
  --background

Verify Node Rejoined

-- Check node status
SELECT node_id, address, build_tag, is_live 
FROM crdb_internal.gossip_nodes;

Ensure the upgraded node is live and running the new version.

Monitor Cluster Health

-- Verify no under-replicated ranges
SELECT count(*) FROM crdb_internal.ranges WHERE under_replicated = true;

-- Check for range issues
SELECT count(*) FROM crdb_internal.ranges WHERE unavailable = true;

Wait for the cluster to stabilize before continuing.

Repeat for Remaining Nodes

Upgrade each remaining node one at a time, following steps 2-4 for each node.

Allow each node to fully rejoin and stabilize before upgrading the next one.

Finalize the Upgrade

After all nodes are upgraded, finalize to enable new features:

SET CLUSTER SETTING version = '24.2';

Finalization is irreversible. You cannot downgrade after this step.

Verify Upgrade Completion

SHOW CLUSTER SETTING version;

Kubernetes Upgrade

For Kubernetes deployments using StatefulSets:

Update Container Image

kubectl set image statefulset/cockroachdb \
  cockroachdb=cockroachdb/cockroach:v24.2.0

Monitor Rolling Update

# Watch pod updates
kubectl rollout status statefulset/cockroachdb

# Check pod versions
kubectl get pods -l app=cockroachdb -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\n"}{end}'

Verify Cluster Health

kubectl exec -it cockroachdb-0 -- \
  ./cockroach sql --certs-dir=certs --execute="
    SELECT node_id, address, build_tag, is_live 
    FROM crdb_internal.gossip_nodes;
  "

Finalize Upgrade

kubectl exec -it cockroachdb-0 -- \
  ./cockroach sql --certs-dir=certs --execute="
    SET CLUSTER SETTING version = '24.2';
  "

Upgrade Strategies

Conservative Approach

For mission-critical clusters:

Partial Upgrade

Upgrade one node and monitor for 24-48 hours before continuing.

Gradual Rollout

Upgrade remaining nodes over several days, one per day.

Delayed Finalization

Wait 1-2 weeks after all nodes are upgraded before finalizing.

Standard Approach

For typical production clusters:

Rolling Upgrade

Upgrade all nodes in a single maintenance window, one at a time.

Observation Period

Monitor for 24-72 hours in mixed-version mode.

Finalize

Complete the upgrade by finalizing the cluster version.

Monitoring During Upgrade

Key Metrics to Watch

Critical Metrics

Node Liveness: All nodes should remain live
Under-replicated Ranges: Should return to 0 after each node upgrade
Query Latency: Monitor for increases during upgrade
Error Rates: Watch for spikes in application errors
CPU/Memory: May increase temporarily during upgrades
Disk I/O: Rebalancing may increase I/O

Monitoring Commands

Check Node Versions

SELECT 
  node_id,
  address,
  build_tag,
  is_live,
  started_at
FROM crdb_internal.gossip_nodes
ORDER BY node_id;

Check Replica Health

SELECT 
  SUM(CASE WHEN under_replicated THEN 1 ELSE 0 END) AS under_replicated,
  SUM(CASE WHEN over_replicated THEN 1 ELSE 0 END) AS over_replicated,
  SUM(CASE WHEN unavailable THEN 1 ELSE 0 END) AS unavailable
FROM crdb_internal.ranges;

Monitor Running Jobs

SHOW JOBS 
WHERE status = 'running' 
ORDER BY created DESC;

Troubleshooting Upgrades

Node Won’t Start After Upgrade

Check Logs

tail -100 cockroach-data/logs/cockroach.log

Look for error messages indicating the issue.

Verify Binary

./cockroach version

Ensure the correct version is installed.

Check Disk Space

df -h /path/to/cockroach-data

Rollback if Needed

If the new version won’t start, revert to the previous binary:

cp /backup/cockroach.old /usr/local/bin/cockroach
cockroach start --certs-dir=certs --advertise-addr=<node-address> --join=<join-addresses> --background

Cluster Performance Degradation

Pause Upgrade

Stop upgrading additional nodes until performance stabilizes

Check Rebalancing

Reduce rebalancing rate if it’s impacting performance

Review Queries

Check for long-running queries or locks

Monitor Resources

Verify CPU, memory, and disk aren’t saturated

Reduce Rebalancing Rate

SET CLUSTER SETTING kv.snapshot_rebalance.max_rate = '32MiB';

Cannot Finalize Upgrade

Check Node Versions

SELECT node_id, build_tag 
FROM crdb_internal.gossip_nodes
WHERE is_live = true;

Ensure all live nodes are running the target version before finalizing.

Rollback Procedures

Before Finalization

You can rollback before finalizing the cluster version:

Stop Upgrading

Don’t upgrade any more nodes.

Downgrade Upgraded Nodes

Replace the new binary with the old version on upgraded nodes:

kill -TERM $(cat cockroach-data/cockroach.pid)
cp /backup/cockroach.old /usr/local/bin/cockroach
cockroach start --certs-dir=certs --advertise-addr=<node-address> --join=<join-addresses> --background

Verify Rollback

SELECT node_id, build_tag FROM crdb_internal.gossip_nodes;

After finalization, rollback is not possible. You can only move forward to newer versions.

Restore from Backup

If issues occur after finalization:

Deploy New Cluster

Create a fresh cluster running the previous version.

Restore Backup

RESTORE FROM LATEST IN 's3://backup-bucket/pre-upgrade?AWS_ACCESS_KEY_ID=xxx&AWS_SECRET_ACCESS_KEY=xxx';

Redirect Applications

Update application connection strings to the restored cluster.

Post-Upgrade Tasks

Update Statistics

-- Let automatic stats collection run, or manually trigger:
ANALYZE TABLE users;

Review New Features

Explore new cluster settings and features in the release notes.

Update Documentation

Document the new version in your infrastructure docs.

Remove Old Binaries

Clean up old binary backups after confirming stability.

Schedule Next Upgrade

Plan for the next upgrade based on release schedules.

Upgrade Best Practices

Stay Current

Upgrade regularly to avoid large version gaps and benefit from bug fixes.

Test First

Always test upgrades in staging before production.

Backup First

Never upgrade without a recent backup.

One Node at a Time

Never upgrade multiple nodes simultaneously.

Monitor Closely

Watch metrics continuously during and after upgrades.

Document Everything

Keep detailed notes of upgrade procedures and any issues.

Plan for Rollback

Always have a tested rollback plan before starting.

Automated Upgrade Considerations

While automation is possible, manual oversight during upgrades is recommended for production clusters:

Semi-automated: Automate binary deployment, but manually verify each step
Full automation: Only for non-critical environments or after extensive testing
Monitoring integration: Automated upgrades should pause on health check failures

Upgrade Planning Template

Upgrade Plan Template

Upgrade Details:

Current Version: ____________
Target Version: ____________
Scheduled Date: ____________
Duration Estimate: ____________

Pre-Upgrade:During Upgrade:

Node 1 upgraded and verified
Node 2 upgraded and verified
Node 3 upgraded and verified
Node N upgraded and verified
All nodes running target version
Cluster health verified

Post-Upgrade:

Next Steps

Deployment

Review deployment options

Backup & Restore

Implement backup strategies

Scaling

Scale your cluster

Migrating to CockroachDB

Performance Optimization

⌘I

Getting Started

Architecture

SQL Reference

Administration

Operations

Performance

​Upgrade Overview

Rolling Upgrade

Version Compatibility

Cluster Version

Zero Downtime

​Version Compatibility

​Supported Upgrade Paths

​Version Skew Policy

​Pre-Upgrade Checklist

​Rolling Upgrade Procedure

​Manual Upgrade Process

​Kubernetes Upgrade

​Upgrade Strategies

​Conservative Approach

​Standard Approach

​Monitoring During Upgrade

​Key Metrics to Watch

​Monitoring Commands

​Troubleshooting Upgrades

​Node Won’t Start After Upgrade

​Cluster Performance Degradation

Pause Upgrade

Check Rebalancing

Review Queries

Monitor Resources

​Cannot Finalize Upgrade

​Rollback Procedures

​Before Finalization

​Restore from Backup

​Post-Upgrade Tasks

​Upgrade Best Practices

​Automated Upgrade Considerations

​Upgrade Planning Template

​Next Steps

Deployment

Backup & Restore

Scaling

Build docs developers (and LLMs) love

Upgrade Overview

Version Compatibility

Supported Upgrade Paths

Version Skew Policy

Pre-Upgrade Checklist

Rolling Upgrade Procedure

Manual Upgrade Process

Kubernetes Upgrade

Upgrade Strategies

Conservative Approach

Standard Approach

Monitoring During Upgrade

Key Metrics to Watch

Monitoring Commands

Troubleshooting Upgrades

Node Won’t Start After Upgrade

Cluster Performance Degradation

Cannot Finalize Upgrade

Rollback Procedures

Before Finalization

Restore from Backup

Post-Upgrade Tasks

Upgrade Best Practices

Automated Upgrade Considerations

Upgrade Planning Template

Next Steps