Monitor node health and readiness for SQL connections
The health check endpoints allow you to verify that a CockroachDB node is running and ready to accept SQL connections. These endpoints are essential for load balancers, orchestration systems, and monitoring tools.
Configure your load balancer to poll /api/v2/health to route traffic only to healthy nodes.HAProxy Example:
backend cockroachdb option httpchk GET /api/v2/health http-check expect status 200 server node1 10.0.1.1:8080 check port 8080 server node2 10.0.1.2:8080 check port 8080 server node3 10.0.1.3:8080 check port 8080
NGINX Example:
upstream cockroachdb { server 10.0.1.1:26257; server 10.0.1.2:26257; server 10.0.1.3:26257;}server { location /health { proxy_pass http://10.0.1.1:8080/api/v2/health; proxy_method GET; }}
Kubernetes Liveness Probes
Use health checks to automatically restart unhealthy pods.
Poll health status to detect node failures and trigger alerts.
Simple Health Monitor
#!/bin/bashNODES=("node1:8080" "node2:8080" "node3:8080")for NODE in "${NODES[@]}"; do STATUS=$(curl -s -o /dev/null -w "%{http_code}" "http://$NODE/api/v2/health/") if [ "$STATUS" -eq 200 ]; then echo "✓ $NODE is healthy" else echo "✗ $NODE is unhealthy (status: $STATUS)" # Send alert curl -X POST https://alerts.example.com/webhook \ -d "{\"node\": \"$NODE\", \"status\": \"unhealthy\"}" fidone
Graceful Shutdown Verification
Check health during node draining to ensure graceful shutdown.
# Start draining the nodecockroach node drain 1 --certs-dir=certs --host=node1:26257# Monitor health statuswhile true; do STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://node1:8080/api/v2/health/) if [ "$STATUS" -eq 503 ]; then echo "Node has been drained successfully" break fi echo "Waiting for node to drain..." sleep 5done
Configure health check timeouts based on your environment:
Development: 3-5 seconds
Production: 5-10 seconds
High-latency networks: 10-15 seconds
Too short: False positives from network latencyToo long: Slow detection of actual failures
2
Use Retry Logic
Implement retries before marking a node as unhealthy:
import requestsfrom time import sleepdef check_health(node_url, max_retries=3): for attempt in range(max_retries): try: response = requests.get( f"{node_url}/api/v2/health", timeout=5 ) if response.status_code == 200: return True except requests.exceptions.RequestException: pass if attempt < max_retries - 1: sleep(2) # Wait before retry return Falseif check_health("http://localhost:8080"): print("Node is healthy")else: print("Node is unhealthy")
3
Check All Nodes Independently
Don’t assume cluster health from a single node:
HEALTHY_NODES=0TOTAL_NODES=3for NODE in node1 node2 node3; do if curl -s -f "http://$NODE:8080/api/v2/health/" > /dev/null; then ((HEALTHY_NODES++)) fidoneif [ $HEALTHY_NODES -ge 2 ]; then echo "Cluster has quorum ($HEALTHY_NODES/$TOTAL_NODES healthy)"else echo "WARNING: Cluster may not have quorum"fi
4
Monitor During Deployments
Watch health status during rolling updates:
# Before upgrading a nodecurl http://node1:8080/api/v2/health/ # Should return 200# Drain the nodecockroach node drain 1 --certs-dir=certs# Verify drain completedcurl http://node1:8080/api/v2/health/ # Should return 503# Upgrade the nodesystemctl stop cockroach# ... perform upgrade ...systemctl start cockroach# Wait for health recoverywhile ! curl -s -f http://node1:8080/api/v2/health/; do echo "Waiting for node to be healthy..." sleep 5doneecho "Node is healthy, proceeding to next node"
Symptom: Health check times out or connection refusedResolution: Verify network connectivity and firewall rules
# Test connectivitytelnet localhost 8080# Check if port is listeningnetstat -tuln | grep 8080# Test from another hostcurl -v http://node1:8080/api/v2/health/
Symptom: Health check returns 503 during maintenanceResolution: This is expected - wait for drain to complete or cancel drain
# Check drain statuscockroach node status --certs-dir=certs# Cancel drain if neededsystemctl restart cockroach
Symptom: Multiple nodes unhealthy, cluster unavailableResolution: Check cluster status and quorum
# Check cluster status from a healthy nodecockroach node status --certs-dir=certs --host=node1:26257# Review logs for errorsgrep -i error /var/log/cockroach/cockroach.log