Skip to main content
CockroachDB is a large, complex distributed database written in Go. Understanding the code structure helps you navigate the codebase and find the right place for your changes.

Repository Layout

The CockroachDB repository is organized into several top-level directories:

pkg/

Core Package DirectoryContains all Go packages that make up CockroachDB. This is where most development happens.

build/

Build InfrastructureBuild scripts, Docker configurations, toolchains, and CI/CD infrastructure.

docs/

Design DocumentsRFCs, design documents, and technical notes about CockroachDB architecture.

licenses/

LicensingLicense information and third-party notices.

Core Packages Overview

The pkg/ directory contains the main CockroachDB components. Here are the key areas:

SQL Layer

The SQL layer handles query parsing, optimization, and execution.
The main SQL package coordinates query execution:
  • Parser (pkg/sql/parser/): Parses SQL statements into AST
  • Semantic Tree (pkg/sql/sem/tree/): Abstract syntax tree definitions
  • Type System (pkg/sql/types/): SQL type definitions and operations
  • Optimizer (pkg/sql/opt/): Cost-based query optimizer
  • Execution (pkg/sql/exec*.go): Query execution coordination
  • Built-ins (pkg/sql/sem/builtins/): SQL function implementations
Team: SQL Queries, SQL Foundations
Cost-based optimizer that transforms SQL queries:
  • Optgen Rules: Transformation rules for optimization
  • Memo: Compact representation of query alternatives
  • Statistics (pkg/sql/stats/): Table statistics for cost estimation
  • Exec Builder (pkg/sql/opt/exec/): Converts optimized plan to execution
Team: SQL Queries
Infrastructure for distributed SQL execution:
  • DistSQL: Distributed query execution framework
  • Processors: Query operators (joins, aggregations, etc.)
  • Flow Infrastructure (pkg/sql/flowinfra/): Scheduling and coordination
Team: SQL Queries
Database catalog and schema management:
  • Table/index/column descriptors
  • Schema change infrastructure
  • Multi-region configuration
  • Privilege management
Team: SQL Foundations
New declarative schema change framework:
  • Job-based schema changes
  • Support for online schema changes
  • Handles complex DDL operations
Team: SQL Foundations

KV Layer

The KV layer provides distributed, transactional key-value storage.
Client interface to the distributed KV store:
  • KV Client (pkg/kv/kvclient/): High-level KV operations
  • Transaction Coordinator (pkg/kv/kvclient/kvcoord/): Coordinates distributed transactions
  • Rangefeed (pkg/kv/kvclient/rangefeed/): Streaming change notifications
  • Batch Operations: Batching and pipelining
Team: KV
Core distributed storage engine:
  • Replica Management: Range replicas and replication
  • Raft Integration: Consensus via Raft
  • Range Operations: Splits, merges, rebalancing
  • Leases (pkg/kv/kvserver/leases/): Range lease management
  • Consistency Checks: Replica consistency verification
  • Allocator (pkg/kv/kvserver/allocator/): Replica placement decisions
Team: KV
Transaction concurrency control:
  • Lock management
  • Conflict resolution
  • Wait queues
  • Deadlock detection
Team: KV
Underlying storage engine (Pebble):
  • RocksDB/Pebble interface
  • Iterator abstractions
  • Filesystem management
  • Encryption at rest
Team: Storage

Distribution & Replication

Raft consensus algorithm implementation:
  • Leader election
  • Log replication
  • Snapshot handling
  • Configuration changes
Based on etcd/raft with CockroachDB-specific modifications.Team: KV
Cluster membership and information dissemination:
  • Node discovery
  • Cluster metadata distribution
  • Network topology
Team: KV Distribution
Range configuration and zone configs:
  • Replication zones
  • Data placement policies
  • Multi-region configuration
Team: KV, SQL Foundations

Jobs & Background Work

Distributed job execution framework:
  • Job scheduling and execution
  • Progress tracking
  • Job resumption after failures
  • Used by backups, imports, schema changes
Team: Jobs, Disaster Recovery
Database backup and restore:
  • Full and incremental backups
  • Point-in-time restore
  • Cross-cluster restore
Team: Disaster Recovery
Change feeds for streaming database changes:
  • Row-level change notifications
  • Multiple sink types (Kafka, webhooks)
  • Exactly-once delivery guarantees
Team: CDC

Server & Observability

HTTP and RPC servers, admin UI, APIs:
  • Node initialization
  • HTTP/RPC endpoints
  • Admin UI serving
  • Multi-tenancy infrastructure
  • Status and metrics APIs
Team: Server, Observability
The cockroach CLI command:
  • start command for starting nodes
  • sql command for SQL shell
  • Various debug and admin commands
  • Demo mode for local clusters
Team: CLI
Structured logging infrastructure:
  • Log channels and sinks
  • Structured log format
  • Redaction for sensitive data
  • Log file management
Team: Observability
Metrics collection and export:
  • Prometheus-compatible metrics
  • Counters, gauges, histograms
  • Time-series data collection
Team: Observability
Distributed request tracing:
  • Span creation and propagation
  • Integration with OpenTelemetry
  • Trace visualization
Team: Observability

Testing & Development Tools

The ./dev development tool:
  • Build orchestration
  • Test execution
  • Code generation
  • Developer workflow automation
Team: Dev Infrastructure
Large-scale integration tests:
  • Cluster-level testing
  • Performance benchmarks
  • Chaos testing
  • Mixed-version tests
Team: Test Engineering
Common testing utilities:
  • Test server creation
  • SQL test helpers
  • Temporary directories
  • Assertion helpers
Team: Test Engineering
Benchmark and test workload generators:
  • TPC-C, TPC-H benchmarks
  • YCSB workload
  • Custom workload definitions
Team: Test Engineering

Finding Code Ownership

Use the CODEOWNERS file to find team ownership:
# Find owner of a package
./pkg/cmd/whoownsit pkg/sql/opt

# View CODEOWNERS
cat .github/CODEOWNERS
The CODEOWNERS file maps code paths to GitHub teams responsible for review and maintenance.
Team names in CODEOWNERS like @cockroachdb/sql-queries-prs indicate the team responsible for that code area.

Key Architectural Patterns

Layered Architecture

CockroachDB follows a layered architecture:
1

SQL Layer

Handles SQL parsing, planning, and distributed execution. Translates SQL to KV operations.
2

Transaction Layer

Provides ACID transactions with serializable isolation. Coordinates distributed transactions.
3

Distribution Layer

Manages data distribution, replication, and rebalancing across nodes.
4

Replication Layer

Uses Raft consensus for consistent replication. Handles leader election and log replication.
5

Storage Layer

Provides persistent storage via Pebble (LSM-based storage engine).

Protocol Buffers for Serialization

CockroachDB uses Protocol Buffers extensively:
  • RPC Messages: All RPC uses protobuf
  • Stored Data: Many structures serialized as protobuf
  • Versioning: Protobuf enables rolling upgrades
Proto files are in packages alongside Go code (e.g., pkg/kv/kvpb/*.proto).

Version Gates

CockroachDB supports rolling upgrades using version gates:
import "github.com/cockroachdb/cockroach/pkg/clusterversion"

if v >= clusterversion.V23_2_MyNewFeature {
    // Use new feature
} else {
    // Use old behavior
}
See pkg/clusterversion/ for version management.

Use CODEOWNERS

The .github/CODEOWNERS file is an excellent starting point for understanding the architecture and finding relevant code.

Follow Imports

Imports show dependencies. Starting from pkg/cmd/cockroach/main.go shows how components connect.

Read Design Docs

Check docs/RFCS/ and docs/tech-notes/ for architectural decisions and rationale.

Ask in Slack

The #contributors channel is great for questions about code structure and architecture.

Important Subsystems

Closed Timestamps

Enables non-blocking reads of historical data:
  • pkg/kv/kvserver/closedts/: Closed timestamp tracking
  • Used for follower reads and CDC

Admission Control

Prevents overload by controlling request admission:
  • pkg/util/admission/: Admission control framework
  • pkg/kv/kvserver/kvadmission/: KV-level admission
  • pkg/kv/kvserver/kvflowcontrol/: Flow control for replication

Multi-Tenancy

Supports multiple isolated tenants in a cluster:
  • pkg/multitenant/: Multi-tenancy infrastructure
  • pkg/server/: Tenant server coordination
  • pkg/ccl/multitenantccl/: Enterprise multi-tenancy features

Security

Authentication, authorization, and encryption:
  • pkg/security/: Core security primitives
  • pkg/sql/pgwire/: PostgreSQL wire protocol and auth
  • pkg/ccl/securityccl/: Enterprise security features

Resources

Architecture Guide

High-level architecture documentation

Design Documents

Original design document

CODEOWNERS

.github/CODEOWNERS in the repository

Tech Notes

docs/tech-notes/ directory for detailed explanations

Next Steps

Building from Source

Learn how to build CockroachDB

Testing

Understand the test infrastructure

Contributing

Start contributing to CockroachDB

Architecture Overview

Read the full architecture guide

Build docs developers (and LLMs) love