Overview
Node operations involve managing individual CockroachDB nodes throughout their lifecycle. This includes starting nodes, gracefully shutting them down, decommissioning nodes for removal, and recommissioning nodes back into service.Starting Nodes
Standard Node Start
Start a node in a multi-node cluster:The first time nodes are started, you must run
cockroach init to initialize the cluster before it can accept connections.Key Start Flags
| Flag | Description | Example |
|---|---|---|
--store | Storage location(s) | --store=path=/mnt/data |
--listen-addr | Node address for cluster communication | --listen-addr=0.0.0.0:26257 |
--advertise-addr | Advertised address to other nodes | --advertise-addr=node1.example.com:26257 |
--http-addr | Admin UI and API address | --http-addr=0.0.0.0:8080 |
--join | Addresses of nodes to join | --join=node1:26257,node2:26257 |
--locality | Node locality for geo-distribution | --locality=region=us-east,zone=us-east-1a |
--cache | Cache size for query results | --cache=25% |
--max-sql-memory | Maximum memory for SQL operations | --max-sql-memory=25% |
Background Mode
Start a node in the background (daemon mode):cockroach-data/logs/ by default.
Stopping Nodes
Graceful Shutdown
Stop a node gracefully using thequit command:
Drain the node
- Stop accepting new SQL connections
- Complete in-flight SQL transactions
- Transfer range leases to other nodes
Wait for lease transfers
The node waits for leases to migrate to other replicas before shutting down.
Shutdown Options
Node Status Operations
Listing Nodes
View all active nodes in the cluster:Detailed Node Status
Get comprehensive status information:Status Fields
The status output includes:- id: Node identifier
- address: Network address for cluster communication
- sql_address: SQL connection address
- build: CockroachDB version
- started_at: Node start timestamp
- updated_at: Last liveness update
- locality: Configured locality tiers
- is_available: Whether node is reachable
- is_live: Whether node is part of cluster consensus
--ranges:
- replicas_leaders: Number of Raft leaders
- replicas_leaseholders: Number of lease holders
- ranges: Total ranges on node
- ranges_unavailable: Unavailable ranges
- ranges_underreplicated: Under-replicated ranges
--decommission:
- is_decommissioning: Decommission in progress
- membership: Cluster membership status (active/decommissioning/decommissioned)
- is_draining: Node is draining connections
Decommissioning Nodes
Decommissioning safely removes a node from the cluster by transferring its data to other nodes.Decommission Process
Initiate decommission
Start the decommission process for one or more nodes:Or decommission multiple nodes:
Monitor progress
The command displays real-time progress showing:
- Replica count decreasing
- Membership changing to ‘decommissioning’
- Range transfers completing
Decommission Options
Decommission Wait Modes
Decommission Wait Modes
Control how long to wait for decommission to complete:
--wait=all(default): Wait until decommission completes--wait=none: Return immediately after initiating
Decommission Checks
Decommission Checks
Pre-flight checks before decommissioning:
--checks=enabled(default): Run readiness checks--checks=skip: Skip pre-flight checks--checks=strict: Require all checks to pass
Decommission Self
Decommission the node you’re connected to:Decommission Scenarios
Scaling down a cluster:Recommissioning Nodes
If you decommission a node by mistake or need to bring it back into service, use recommission:- Resets the decommissioning state
- Allows the node to accept replicas again
- Rebalances data back to the node
You can only recommission a node that is still running. Once stopped, start a new node instead.
Recommission Multiple Nodes
Node Draining
Draining is the process of preparing a node for shutdown:- Node stops accepting new SQL connections
- Existing connections complete their transactions
- Range leases transfer to other nodes
- Node becomes ready for shutdown
The
quit command automatically performs draining before shutdown.Troubleshooting
Decommission Stalls
If decommission appears stuck:Identify blocking ranges
Look for under-replicated or unavailable ranges preventing replica movement.
Verify cluster health
Ensure enough healthy nodes exist to accept replicas:
- Cluster must have sufficient capacity
- At least 3 nodes for default replication factor
- All remaining nodes must be live
Node Won’t Start
Common causes:Clock skew too large
Clock skew too large
Nodes must be within 500ms of each other:Solution: Synchronize clocks using NTP.
Store path issues
Store path issues
- Directory doesn’t exist or lacks permissions
- Store path contains data from different cluster
- Disk full or hardware failure
Network connectivity
Network connectivity
- Cannot reach nodes in
--joinlist - Firewall blocking ports 26257 or 8080
- DNS resolution failures
Node Liveness Issues
Node shows as not live: Possible causes:- Network partition
- Node overloaded (CPU/memory)
- Clock synchronization issues
- Disk I/O problems
Best Practices
Node Operation Recommendations
Node Operation Recommendations
- Always use graceful shutdown: Prevents data inconsistencies
- Monitor decommission progress: Don’t stop nodes until fully decommissioned
- Maintain cluster capacity: Ensure cluster can handle replica transfers
- Decommission one node at a time: Prevents overwhelming the cluster
- Wait between operations: Allow cluster to stabilize between changes
- Keep node counts odd: Prevents quorum split scenarios (3, 5, 7)
- Test in staging: Practice operations in non-production first
- Document node roles: Track which nodes serve specific workloads
See Also
- Cluster Management - Cluster-wide operations
- Configuration - Node configuration options
- Monitoring - Node health monitoring