Query Tuning and EXPLAIN

Overview

The EXPLAIN statement shows how CockroachDB executes a query, including the query plan, distribution strategy, and cost estimates. Understanding EXPLAIN output is essential for query optimization.

EXPLAIN Variants

CockroachDB provides several EXPLAIN modes for different analysis needs.

-- Shows logical query plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 'cust123';

Understanding EXPLAIN Output

Basic Plan Structure

A typical EXPLAIN output shows a tree of operators:

EXPLAIN SELECT id, total FROM orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 10;

  distribution: local
  vectorized: true

  • limit
  │ count: 10
  │
  └── • sort
      │ order: -created_at
      │
      └── • scan
            table: orders@idx_status_created
            spans: [/'pending' - /'pending']

Read from bottom to top

Execution flows from the leaf nodes (table scans) up to the root.

Identify scan operators

Look for scan, index join, and lookup join operators that access data.

Check distribution

distribution: local means single-node, distribution: full means distributed across cluster.

Verify vectorization

vectorized: true indicates columnar execution (faster).

Scan Operations

Full Table Scan

Reads all rows in a table - avoid when possible.

EXPLAIN SELECT * FROM orders;

  • scan
    table: orders@orders_pkey
    spans: FULL SCAN

Full scans are inefficient for large tables. Add indexes or filters to limit rows scanned.

Index Scan

Uses an index to efficiently locate rows.

EXPLAIN SELECT * FROM orders WHERE customer_id = 'cust123';

  • scan
    table: orders@idx_customer
    spans: [/'cust123' - /'cust123']

Index-Only Scan (Covering Index)

Reads all data from index without accessing table.

CREATE INDEX idx_customer_total ON orders (customer_id) STORING (total);

EXPLAIN SELECT customer_id, total FROM orders WHERE customer_id = 'cust123';

  • scan
    table: orders@idx_customer_total
    spans: [/'cust123' - /'cust123']

Covering indexes are fastest - they avoid the index join lookup step.

JOIN Operations

Hash Join

Builds hash table of one side, probes with other side.

EXPLAIN SELECT o.id, c.name 
FROM orders o 
JOIN customers c ON o.customer_id = c.id;

  • hash join
  │ equality: (customer_id) = (id)
  │
  ├── • scan
  │     table: orders@orders_pkey
  │
  └── • scan
        table: customers@customers_pkey

Lookup Join

Uses index lookups to join tables - efficient for selective joins.

EXPLAIN SELECT o.id, c.name 
FROM orders o 
INNER LOOKUP JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > NOW() - INTERVAL '1 hour';

  • lookup join
  │ table: customers@customers_pkey
  │ equality: (customer_id) = (id)
  │
  └── • scan
        table: orders@idx_created_at
        spans: [/2024-03-03T10:00:00 - ]

Merge Join

Joins two sorted inputs - efficient when both sides are already sorted.

EXPLAIN SELECT o.id, p.name 
FROM orders o 
JOIN products p ON o.product_id = p.id
ORDER BY o.product_id;

  • merge join
  │ equality: (product_id) = (id)
  │ left ordering: +product_id
  │ right ordering: +id
  │
  ├── • scan
  │     table: orders@idx_product
  │
  └── • scan
        table: products@products_pkey

EXPLAIN ANALYZE: Runtime Statistics

EXPLAIN ANALYZE executes the query and shows actual performance metrics.

EXPLAIN ANALYZE SELECT COUNT(*) FROM orders WHERE status = 'pending';

  distribution: local
  vectorized: true

  • group (scalar)
  │ actual row count: 1
  │
  └── • scan
        table: orders@idx_status
        spans: [/'pending' - /'pending']
        actual row count: 1,247
        KV time: 15.2ms
        KV bytes read: 45KB
        estimated row count: 1,200

Key Metrics

actual row count: Rows processed by each operator
KV time: Time spent in storage layer
KV bytes read: Amount of data read from disk
estimated row count: Optimizer’s prediction

Compare actual row count vs estimated row count. Large discrepancies indicate stale statistics.

Query Optimization Patterns

Problem: Full Table Scan

Symptom: spans: FULL SCAN in EXPLAIN output

-- Slow: Full table scan
EXPLAIN SELECT * FROM orders WHERE status = 'pending';

  • scan
    table: orders@orders_pkey
    spans: FULL SCAN  -- ⚠️ Problem!
    estimated row count: 1,000,000

Solution: Create an index

CREATE INDEX idx_status ON orders (status);

EXPLAIN SELECT * FROM orders WHERE status = 'pending';

  • scan
    table: orders@idx_status
    spans: [/'pending' - /'pending']  -- ✓ Index scan
    estimated row count: 1,200

Problem: Index Join Overhead

Symptom: Separate index join step after scan

CREATE INDEX idx_customer ON orders (customer_id);

EXPLAIN SELECT customer_id, total, status FROM orders WHERE customer_id = 'cust123';

  • index join
  │ table: orders@orders_pkey
  │
  └── • scan
        table: orders@idx_customer
        spans: [/'cust123' - /'cust123']

Solution: Use covering index with STORING

CREATE INDEX idx_customer_covering ON orders (customer_id) STORING (total, status);

EXPLAIN SELECT customer_id, total, status FROM orders WHERE customer_id = 'cust123';

  • scan
    table: orders@idx_customer_covering  -- ✓ No index join needed
    spans: [/'cust123' - /'cust123']

Problem: Inefficient JOIN Order

Symptom: Large table scanned first in join

EXPLAIN SELECT o.id, c.name 
FROM orders o 
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'US';

  • hash join
  │
  ├── • scan
  │     table: orders@orders_pkey
  │     estimated row count: 10,000,000  -- ⚠️ Large scan
  │
  └── • scan
        table: customers@idx_country
        spans: [/'US' - /'US']
        estimated row count: 50,000

Solution: Filter before joining

-- Create index on customer_id for orders
CREATE INDEX idx_orders_customer ON orders (customer_id);

EXPLAIN SELECT o.id, c.name 
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'US';

  • lookup join
  │ table: orders@idx_orders_customer
  │
  └── • scan
        table: customers@idx_country
        spans: [/'US' - /'US']
        estimated row count: 50,000  -- ✓ Small table first

Problem: Sorting Large Result Sets

Symptom: Separate sort operation

EXPLAIN SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC;

  • sort
  │ order: -created_at  -- ⚠️ Expensive sort
  │
  └── • scan
        table: orders@idx_status
        estimated row count: 100,000

Solution: Create index with sort order

CREATE INDEX idx_status_created ON orders (status, created_at DESC);

EXPLAIN SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at DESC;

  • scan
    table: orders@idx_status_created  -- ✓ Pre-sorted by index
    spans: [/'pending' - /'pending']

Advanced Analysis

Statement Bundle

Collect comprehensive diagnostics for a slow query.

EXPLAIN ANALYZE (DEBUG) SELECT * FROM orders WHERE customer_id = 'cust123';

This generates a ZIP file with:

Query plan
Table statistics
Schema definitions
Trace data

Query Plan Cache

View cached query plans:

SELECT query, plan FROM crdb_internal.statement_statistics
WHERE query LIKE '%orders%'
LIMIT 5;

Index Recommendations

CockroachDB can suggest indexes based on query patterns:

EXPLAIN (OPT) SELECT * FROM orders WHERE customer_id = 'cust123' AND status = 'pending';

Look for index recommendations in the output.

Performance Testing Workflow

Capture baseline

Run EXPLAIN ANALYZE on the original query to capture baseline metrics.

Identify bottlenecks

Look for full scans, expensive sorts, or large row counts.

Apply optimization

Create indexes, rewrite query, or adjust schema.

Verify improvement

Run EXPLAIN ANALYZE again and compare metrics.

Test at scale

Use realistic data volumes and concurrency levels.

Best Practices

EXPLAIN analysis checklist:

✓ No FULL SCAN on large tables
✓ vectorized: true for columnar execution
✓ Covering indexes to avoid index joins
✓ Small estimated row counts
✓ actual row count ≈ estimated row count
✓ Selective filters applied early
✓ Lookup joins instead of hash joins when appropriate
✓ No unnecessary sort operations
✓ Distribution strategy matches data locality
✓ Recent table statistics

Common EXPLAIN Flags

-- Show detailed type information
EXPLAIN (TYPES) SELECT * FROM orders;

-- Show optimizer internals
EXPLAIN (OPT) SELECT * FROM orders;

-- Show distributed SQL visual plan
EXPLAIN (DISTSQL) SELECT * FROM orders;

-- Combine multiple flags
EXPLAIN (VERBOSE, TYPES) SELECT * FROM orders;

Getting Started

Architecture

SQL Reference

Administration

Operations

Performance

Query Tuning and EXPLAIN

Overview

EXPLAIN Variants

Understanding EXPLAIN Output

Basic Plan Structure

Scan Operations

Full Table Scan

Index Scan

Index-Only Scan (Covering Index)

JOIN Operations

Hash Join

Lookup Join

Merge Join

EXPLAIN ANALYZE: Runtime Statistics

Key Metrics

Query Optimization Patterns

Problem: Full Table Scan

Problem: Index Join Overhead

Problem: Inefficient JOIN Order

Problem: Sorting Large Result Sets

Advanced Analysis

Statement Bundle

Query Plan Cache

Index Recommendations

Performance Testing Workflow

Best Practices

Common EXPLAIN Flags

Build docs developers (and LLMs) love

Getting Started

Architecture

SQL Reference

Administration

Operations

Performance

​Overview

​EXPLAIN Variants

​Understanding EXPLAIN Output

​Basic Plan Structure

​Scan Operations

​Full Table Scan

​Index Scan

​Index-Only Scan (Covering Index)

​JOIN Operations

​Hash Join

​Lookup Join

​Merge Join

​EXPLAIN ANALYZE: Runtime Statistics

​Key Metrics

​Query Optimization Patterns

​Problem: Full Table Scan

​Problem: Index Join Overhead

​Problem: Inefficient JOIN Order

​Problem: Sorting Large Result Sets

​Advanced Analysis

​Statement Bundle

​Query Plan Cache

​Index Recommendations

​Performance Testing Workflow

​Best Practices

​Common EXPLAIN Flags

Build docs developers (and LLMs) love

Overview

EXPLAIN Variants

Understanding EXPLAIN Output

Basic Plan Structure

Scan Operations

Full Table Scan

Index Scan

Index-Only Scan (Covering Index)

JOIN Operations

Hash Join

Lookup Join

Merge Join

EXPLAIN ANALYZE: Runtime Statistics

Key Metrics

Query Optimization Patterns

Problem: Full Table Scan

Problem: Index Join Overhead

Problem: Inefficient JOIN Order

Problem: Sorting Large Result Sets

Advanced Analysis

Statement Bundle

Query Plan Cache

Index Recommendations

Performance Testing Workflow

Best Practices

Common EXPLAIN Flags