Initial Troubleshooting Steps
When you experience issues, start with these steps:Check logs for errors
Logs are generated on a per-node basis:
The
debug zip command collects logs, metrics, and diagnostics from all cluster nodes into a single archive for troubleshooting.Cluster Setup Issues
Cannot Start Single-Node Cluster
Existing storage directory conflict
Existing storage directory conflict
Problem: Node won’t start due to existing cluster data.Solution:
Ports already in use
Ports already in use
Problem: Default ports 26257 or 8080 are occupied.Solution:
Incompatible CPU architecture
Incompatible CPU architecture
Problem: Exit status 132 (SIGILL) indicates unsupported CPU instructions.Solution:
- Use official CockroachDB release builds (support all x86-64 CPUs)
- Verify binary is correct for your architecture
- Check if running very old CPU without required instruction sets
Multi-Node Cluster Issues
Cannot join node to existing cluster
Cannot join node to existing cluster
Problem: Node won’t join cluster with Diagnosis:Solutions:
--join flag.Performance degraded when adding nodes
Performance degraded when adding nodes
Problem: Cluster slows down during node additions.Cause: Solution:
kv.snapshot_rebalance.max_rate set too high causes write overload.Connection Issues
Cannot Connect with SQL Client
Connection refused error
Connection refused error
- Ensure node is running:
cockroach node status - Verify port number matches node configuration
- Include flags used during node start (e.g.,
--port,--host) - Check firewall rules allow connection
SSL/TLS connection errors
SSL/TLS connection errors
Performance Issues
High Query Latency
Diagnose slow queries
Diagnose slow queries
- Missing indexes: Use
EXPLAINto identify full table scans - High CPU usage: Check CPU metrics, reduce concurrency
- Disk I/O bottleneck: Monitor disk IOPS and latency
- Transaction contention: Check
crdb_internal.cluster_contention_events
Identify and resolve contention
Identify and resolve contention
- Reduce transaction duration (keep transactions short)
- Avoid hot keys (use UUID or composite keys)
- Use
SELECT FOR UPDATEto explicitly lock - Consider optimistic locking patterns
High CPU Usage
Diagnose CPU issues
Diagnose CPU issues
- Excessive concurrency: Too many active queries
- Inefficient queries: Full table scans, missing indexes
- Compaction falling behind: Check LSM health
- Under-provisioned cluster: Need more CPU cores
- Limit connection pool size to ~4x vCPU count
- Optimize slow queries (add indexes, rewrite)
- Scale horizontally (add more nodes)
- Use connection pooling (PgBouncer)
Memory Issues
Out of memory (OOM) crashes
Out of memory (OOM) crashes
Symptoms: Nodes restart unexpectedly.Diagnosis:Solutions:
Storage Issues
Low disk space
Low disk space
Problem: Nodes shut down when disk space < 10%.Solutions:
Unhealthy LSM (inverted LSM)
Unhealthy LSM (inverted LSM)
Problem: High L0 sublevels indicate compaction falling behind.Solutions:Prevention:
- Ensure adequate CPU resources
- Don’t set snapshot rebalance rate too high
- Monitor L0 sublevels continuously
- Scale cluster before reaching capacity limits
Replication Issues
Under-replicated ranges
Under-replicated ranges
Problem: Some ranges have fewer replicas than configured.Common causes:
- Node failure or network partition
- Insufficient nodes for replication factor
- Constraint violations (locality constraints)
- Slow replication due to network or disk issues
- Ensure all nodes are healthy and reachable
- Verify cluster has enough nodes for replication factor
- Check and fix zone configuration constraints
- Monitor replication queue length and duration
Unavailable ranges
Unavailable ranges
Common Error Messages
Transaction retry errors (40001)
Transaction retry errors (40001)
- Transaction took too long (exceeded deadline)
- Contention with other transactions
- Node failures during transaction
- Implement retry logic in application
- Reduce transaction duration
- Use
AS OF SYSTEM TIMEfor read-only queries - Investigate and resolve contention
Certificate errors
Certificate errors
Getting Help
If you cannot resolve the issue:Review documentation
Contact support
- Community: CockroachDB Forum
- Slack: CockroachDB Community Slack
- Enterprise Support: File ticket through support portal
- GitHub: File an issue