Monitoring CockroachDB Cloud Clusters

Monitor your CockroachDB Cloud cluster’s performance, health, and resource usage using built-in tools and integrations with third-party monitoring platforms.

Monitoring Options

CockroachDB Cloud provides multiple ways to monitor your clusters:

Cloud Console Metrics: Built-in performance dashboards
Exported Metrics: Integration with Datadog, Prometheus, and CloudWatch
SQL Activity: Query performance and insights
Alerts: Automated notifications for issues

Cloud Console Metrics

All CockroachDB Cloud plans include access to performance metrics in the Cloud Console.

Access Metrics

Navigate to Metrics

Select your cluster
Click Metrics in the left navigation

Select Tab

Choose from available metric categories:

Overview: High-level cluster health
SQL: Query performance
Request Units: RU consumption (Basic/Standard)
Changefeeds: CDC performance
Row-Level TTL: TTL job metrics
Custom: Create custom charts

Select Time Range

Use the time selector to view metrics for:

Last hour
Last 6 hours
Last 12 hours
Last 24 hours
Last 7 days
Custom range

Overview Metrics

Key cluster health indicators: CPU Usage:

Shows SQL and storage CPU utilization
Alert if consistently >70%
Scale up if sustained high usage

SQL Statements:

Queries per second (QPS)
Statement latency percentiles (P50, P90, P99)
Identify traffic patterns and spikes

Storage:

Used storage across cluster
Storage capacity (Advanced only)
Growth trends

SQL Connections:

Active SQL connections
New connection attempts
Monitor for connection pool issues

Request Units (Basic/Standard):

RU consumption rate
Breakdown by resource type
Track against provisioned capacity

SQL Metrics

Detailed query performance: Statement Execution:

Total statements executed
Breakdown by statement type (SELECT, INSERT, UPDATE, DELETE)
Execution latency by percentile

Transaction Performance:

Transaction count
Transaction latency
Contention events
Retries and errors

Connection Activity:

Active connections
Idle connections
Connection rate

Changefeed Metrics

Monitor change data capture: Throughput:

Messages emitted per second
Bytes emitted per second

Latency:

Commit-to-emit latency
Queue processing time

Status:

Running changefeeds
Failed changefeeds
Checkpoint progress

Export Metrics

Standard and Advanced clusters can export metrics to external monitoring platforms.

Supported Platforms

Datadog

Full-featured monitoring and alerting

Prometheus

Open-source monitoring and time-series database

CloudWatch

AWS native monitoring service

Export to Datadog

Get Datadog API Key

Log in to Datadog
Navigate to Organization Settings > API Keys
Create or copy an API key

Configure Export

In CockroachDB Cloud Console:

Go to cluster Monitoring > Metrics Export
Click Add Integration
Select Datadog
Enter API key and select region
Click Save

View in Datadog

Metrics appear in Datadog within 5 minutes under the crdb namespace

Available Metrics:

crdb.capacity.available
crdb.capacity.used
crdb.sql.conns
crdb.sql.query.count
crdb.sql.query.latency
Plus 100+ additional metrics

Export to Prometheus

Enable Metrics Export

Navigate to Monitoring > Metrics Export
Click Add Integration
Select Prometheus

Get Scrape Configuration

Copy the provided Prometheus scrape configuration:

scrape_configs:
  - job_name: 'cockroachdb-cloud'
    static_configs:
      - targets: ['cluster-id.crdb.io:8080']
    metrics_path: '/api/v2/prometheus/'
    bearer_token: 'your-bearer-token'

Configure Prometheus

Add the scrape configuration to your prometheus.yml file

Verify Collection

Check Prometheus targets page to confirm metrics collection

Export to CloudWatch

For Advanced clusters on AWS:

Create IAM Role

Create an IAM role with permissions to write to CloudWatch

Configure Export

Go to Monitoring > Metrics Export
Select CloudWatch
Enter role ARN and log group name
Click Save

View in CloudWatch

Metrics appear in CloudWatch within 5 minutes

SQL Activity Monitoring

Monitor individual query performance and identify issues.

Statements Page

View and analyze SQL statement performance:

Access Statements

Click SQL Activity > Statements in cluster navigation

Review Statements

View statements sorted by:

Execution count
Rows processed
Bytes read
Latency (P50, P90, P99)
Contention time

Analyze Statement

Click any statement to view:

Full statement text
Execution plan
Performance statistics
Resource consumption
Execution history

Key Metrics:

Execution Count: How often the query runs
Rows Read: Amount of data scanned
Latency: Query response time
Contention: Lock wait time

Transactions Page

Monitor transaction-level performance:

Transaction count and rate
Transaction latency percentiles
Retry counts
Contention events
Transaction breakdown by statement

Insights Page

Automatic performance recommendations: Available Insights:

High retry counts
Queries with sub-optimal indexes
Schema design issues
Transaction contention
Performance bottlenecks

View Insights

Click SQL Activity > Insights

Review Recommendations

Each insight includes:

Problem description
Affected queries
Recommended solution
Estimated impact

Take Action

Implement recommended fixes and monitor improvement

Alerts and Notifications

Configure automated alerts for cluster issues.

Built-in Alerts

Organization Admins automatically receive alerts for:

Planned Maintenance: Upcoming updates and maintenance
Performance Issues: High CPU, memory, or storage usage
Cluster Problems: Node failures, replication issues
Backup Failures: Failed backup jobs

Configure Alert Recipients

Navigate to Alerts

Click Alerts in the cluster navigation

Add Recipients

Click Add recipient
Enter email address
Select alert types
Click Save

Test Alerts

Click Send test alert to verify configuration

Alert Types

Alert Type	Description	Severity
Cluster unavailable	Cluster not responding	Critical
High CPU	CPU usage >80% for 30+ min	Warning
Low storage	Storage >85% full	Warning
Backup failed	Backup job failed	Warning
Node down	Node unreachable	Critical
High memory	Memory usage >90%	Warning

External Alerting

Integrate with external alerting platforms: Via Exported Metrics:

Set up alerts in Datadog, Prometheus, or CloudWatch
Define custom thresholds and notification rules
Combine with application metrics

Via Cloud API:

Poll cluster status endpoints
Implement custom alerting logic
Integrate with PagerDuty, Opsgenie, etc.

Essential Metrics by Plan

Basic Cluster Metrics

Focus on these metrics for Basic clusters:

Request Units: Monitor RU consumption and spend limit
SQL Statements: Track query volume and latency
Storage: Monitor data growth
Connections: Ensure proper connection pooling

Standard Cluster Metrics

Key metrics for Standard clusters:

Provisioned Capacity: Monitor against actual CPU usage
Request Units: Track RU consumption by resource type
SQL Performance: Query latency and throughput
Storage: Monitor usage and growth rate
Cross-Region Traffic: Optimize for cost

Advanced Cluster Metrics

Important metrics for Advanced clusters:

Node Health: Individual node CPU, memory, storage
Replication: Replica distribution and health
SQL Performance: Query and transaction latency
Storage IOPS: I/O performance (AWS)
Network: Inter-node and cross-region traffic

DB Console (Advanced Only)

Advanced clusters have access to the DB Console for detailed monitoring.

Access DB Console

Authorize Network

Add your IP address to the allowlist with DB Console access

Open DB Console

Go to Tools page
Click Open DB Console
Authenticate with SQL credentials

DB Console Features

Overview:

Cluster topology visualization
Node status and health
Live traffic metrics

Metrics:

100+ detailed performance metrics
Customizable time ranges
Per-node breakdowns

SQL Activity:

Live statement execution
Transaction details
Contention analysis

Databases:

Database and table details
Index usage statistics
Schema information

Jobs:

Running and completed jobs
Backup/restore progress
Changefeed status

Advanced Debug:

Range distribution
Raft status
Node logs
Cluster events

Monitoring Best Practices

Establish Baselines

Collect Data

Monitor metrics during normal operation for 1-2 weeks

Identify Patterns

Document typical values for:

Peak and off-peak hours
Daily/weekly patterns
Normal CPU and memory usage
Typical query latency

Set Thresholds

Create alerts based on deviations from baseline

Monitor Trends

Track changes over time:

Storage Growth: Plan capacity increases
Query Volume: Anticipate scaling needs
Latency Trends: Identify degradation early
Error Rates: Catch issues before they escalate

Regular Reviews

Schedule monitoring reviews:

Daily: Check for alerts and anomalies
Weekly: Review performance trends
Monthly: Analyze capacity and optimization opportunities
Quarterly: Audit monitoring coverage and alerts

Key Performance Indicators

Track these KPIs:

KPI	Target	Action Threshold
CPU Utilization	Under 70%	Over 80% for 30 min
Query P99 Latency	Under 100ms	Over 200ms
Error Rate	Under 0.1%	Over 1%
Storage Usage	Under 80%	Over 85%
Connection Count	Under 500	Over 1000

Troubleshooting with Metrics

High CPU Usage

Investigate:

Check SQL Statements for expensive queries
Review Insights for optimization opportunities
Look for query volume spikes

Solutions:

Optimize expensive queries
Add indexes
Scale up capacity

High Latency

Investigate:

Check transaction contention
Review query execution plans
Analyze network latency (multi-region)

Solutions:

Reduce transaction scope
Optimize queries
Adjust table localities

Storage Growth

Investigate:

Review database and table sizes
Check for data retention policies
Look for unexpected data growth

Solutions:

Implement TTL for old data
Archive historical data
Compress large columns

Next Steps

Performance Tuning

Optimize cluster performance

SQL Activity

Analyze query performance

Scaling

Learn when and how to scale

Alerting

Configure alerts

Getting Started

Cluster Management

Security

Billing & Plans

​Monitoring Options

​Cloud Console Metrics

​Access Metrics

​Overview Metrics

​SQL Metrics

​Changefeed Metrics

​Export Metrics

​Supported Platforms

Datadog

Prometheus

CloudWatch

​Export to Datadog

​Export to Prometheus

​Export to CloudWatch

​SQL Activity Monitoring

​Statements Page

​Transactions Page

​Insights Page

​Alerts and Notifications

​Built-in Alerts

​Configure Alert Recipients

​Alert Types

​External Alerting

​Essential Metrics by Plan

​Basic Cluster Metrics

​Standard Cluster Metrics

​Advanced Cluster Metrics

​DB Console (Advanced Only)

​Access DB Console

​DB Console Features

​Monitoring Best Practices

​Establish Baselines

​Monitor Trends

​Regular Reviews

​Key Performance Indicators

​Troubleshooting with Metrics

​High CPU Usage

​High Latency

​Storage Growth

​Next Steps

Performance Tuning

SQL Activity

Scaling

Alerting

Build docs developers (and LLMs) love

Monitoring Options

Cloud Console Metrics

Access Metrics

Overview Metrics

SQL Metrics

Changefeed Metrics

Export Metrics

Supported Platforms

Export to Datadog

Export to Prometheus

Export to CloudWatch

SQL Activity Monitoring

Statements Page

Transactions Page

Insights Page

Alerts and Notifications

Built-in Alerts

Configure Alert Recipients

Alert Types

External Alerting

Essential Metrics by Plan

Basic Cluster Metrics

Standard Cluster Metrics

Advanced Cluster Metrics

DB Console (Advanced Only)

Access DB Console

DB Console Features

Monitoring Best Practices

Establish Baselines

Monitor Trends

Regular Reviews

Key Performance Indicators

Troubleshooting with Metrics

High CPU Usage

High Latency

Storage Growth

Next Steps