Skip to main content
The health check endpoints allow you to verify that a CockroachDB node is running and ready to accept SQL connections. These endpoints are essential for load balancers, orchestration systems, and monitoring tools.

Health Check

Determine if a node is running and ready to accept SQL connections.
GET /api/v2/health
curl --request GET \
  --url https://localhost:8080/api/v2/health/
Stability: Stable
This endpoint does not require authentication.

Response Codes

200 OK
status
The node is healthy and ready to accept SQL connections.
{
  "status": "ok"
}
503 Service Unavailable
status
The node is not ready to accept SQL connections. This may occur during:
  • Node startup
  • Cluster initialization
  • Node draining or decommissioning
  • Critical internal errors
{
  "status": "unavailable",
  "message": "node is not ready"
}

When to Use

Use the health endpoint for:
Configure your load balancer to poll /api/v2/health to route traffic only to healthy nodes.HAProxy Example:
backend cockroachdb
    option httpchk GET /api/v2/health
    http-check expect status 200
    server node1 10.0.1.1:8080 check port 8080
    server node2 10.0.1.2:8080 check port 8080
    server node3 10.0.1.3:8080 check port 8080
NGINX Example:
upstream cockroachdb {
    server 10.0.1.1:26257;
    server 10.0.1.2:26257;
    server 10.0.1.3:26257;
}

server {
    location /health {
        proxy_pass http://10.0.1.1:8080/api/v2/health;
        proxy_method GET;
    }
}
Use health checks to automatically restart unhealthy pods.
apiVersion: v1
kind: Pod
metadata:
  name: cockroachdb
spec:
  containers:
  - name: cockroachdb
    image: cockroachdb/cockroach:v25.3.0
    livenessProbe:
      httpGet:
        path: /api/v2/health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
Prevent traffic from being sent to nodes that aren’t ready.
apiVersion: v1
kind: Pod
metadata:
  name: cockroachdb
spec:
  containers:
  - name: cockroachdb
    image: cockroachdb/cockroach:v25.3.0
    readinessProbe:
      httpGet:
        path: /api/v2/health
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 3
      successThreshold: 1
      failureThreshold: 2
Poll health status to detect node failures and trigger alerts.
Simple Health Monitor
#!/bin/bash

NODES=("node1:8080" "node2:8080" "node3:8080")

for NODE in "${NODES[@]}"; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "http://$NODE/api/v2/health/")
  
  if [ "$STATUS" -eq 200 ]; then
    echo "✓ $NODE is healthy"
  else
    echo "✗ $NODE is unhealthy (status: $STATUS)"
    # Send alert
    curl -X POST https://alerts.example.com/webhook \
      -d "{\"node\": \"$NODE\", \"status\": \"unhealthy\"}"
  fi
done
Check health during node draining to ensure graceful shutdown.
# Start draining the node
cockroach node drain 1 --certs-dir=certs --host=node1:26257

# Monitor health status
while true; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://node1:8080/api/v2/health/)
  if [ "$STATUS" -eq 503 ]; then
    echo "Node has been drained successfully"
    break
  fi
  echo "Waiting for node to drain..."
  sleep 5
done

Health Check vs. Other Monitoring

The health endpoint differs from other monitoring approaches:
MethodPurposeAuthenticationUse Case
/api/v2/healthSQL readiness checkNoneLoad balancers, orchestration
/api/v2/nodesDetailed node infoRequiredMonitoring dashboards
_status/varsPrometheus metricsNoneTime-series monitoring
DB ConsoleVisual monitoringBrowser-basedHuman operators

Best Practices

1

Set Appropriate Timeouts

Configure health check timeouts based on your environment:
  • Development: 3-5 seconds
  • Production: 5-10 seconds
  • High-latency networks: 10-15 seconds
Too short: False positives from network latencyToo long: Slow detection of actual failures
2

Use Retry Logic

Implement retries before marking a node as unhealthy:
import requests
from time import sleep

def check_health(node_url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(
                f"{node_url}/api/v2/health",
                timeout=5
            )
            if response.status_code == 200:
                return True
        except requests.exceptions.RequestException:
            pass
        
        if attempt < max_retries - 1:
            sleep(2)  # Wait before retry
    
    return False

if check_health("http://localhost:8080"):
    print("Node is healthy")
else:
    print("Node is unhealthy")
3

Check All Nodes Independently

Don’t assume cluster health from a single node:
HEALTHY_NODES=0
TOTAL_NODES=3

for NODE in node1 node2 node3; do
  if curl -s -f "http://$NODE:8080/api/v2/health/" > /dev/null; then
    ((HEALTHY_NODES++))
  fi
done

if [ $HEALTHY_NODES -ge 2 ]; then
  echo "Cluster has quorum ($HEALTHY_NODES/$TOTAL_NODES healthy)"
else
  echo "WARNING: Cluster may not have quorum"
fi
4

Monitor During Deployments

Watch health status during rolling updates:
# Before upgrading a node
curl http://node1:8080/api/v2/health/  # Should return 200

# Drain the node
cockroach node drain 1 --certs-dir=certs

# Verify drain completed
curl http://node1:8080/api/v2/health/  # Should return 503

# Upgrade the node
systemctl stop cockroach
# ... perform upgrade ...
systemctl start cockroach

# Wait for health recovery
while ! curl -s -f http://node1:8080/api/v2/health/; do
  echo "Waiting for node to be healthy..."
  sleep 5
done

echo "Node is healthy, proceeding to next node"

Troubleshooting Unhealthy Nodes

If a node returns 503 or is unreachable:
Symptom: Health check returns 503 immediately after startingResolution: Wait for initialization to complete (typically 30-60 seconds)
# Check node logs
tail -f /var/log/cockroach/cockroach.log | grep "CockroachDB node starting"

Secure Clusters

For secure clusters with TLS enabled, use HTTPS and provide the CA certificate:
With CA Certificate
curl --cacert ca.crt \
  --request GET \
  --url https://localhost:8080/api/v2/health/
Skip Certificate Verification (not recommended for production)
curl --insecure \
  --request GET \
  --url https://localhost:8080/api/v2/health/

Response Time Monitoring

Monitor health check response times to detect degradation:
Measure Response Time
curl -w "@curl-format.txt" \
  -o /dev/null \
  -s \
  https://localhost:8080/api/v2/health/
Create curl-format.txt:
time_namelookup:  %{time_namelookup}s
time_connect:     %{time_connect}s
time_total:       %{time_total}s
Typical response times:
  • Local: < 10ms
  • Same datacenter: < 50ms
  • Cross-region: < 200ms
Consistently slow responses (> 500ms) may indicate:
  • Node resource saturation
  • Network congestion
  • Disk I/O issues

Integration Examples

Terraform AWS ALB Health Check

resource "aws_lb_target_group" "cockroachdb" {
  name     = "cockroachdb-tg"
  port     = 26257
  protocol = "TCP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
    protocol            = "HTTP"
    path                = "/api/v2/health"
    port                = 8080
    timeout             = 5
  }
}

Docker Compose Health Check

services:
  cockroachdb:
    image: cockroachdb/cockroach:v25.3.0
    command: start-single-node --insecure
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/v2/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

Prometheus Blackbox Exporter

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: GET
      preferred_ip_protocol: "ip4"
Then scrape:
scrape_configs:
  - job_name: 'cockroachdb_health'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - http://node1:8080/api/v2/health
        - http://node2:8080/api/v2/health
        - http://node3:8080/api/v2/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

Build docs developers (and LLMs) love