cockroach process and explains how CockroachDB achieves its goals.
You do not need to understand the underlying architecture to use CockroachDB. These pages give serious users and database enthusiasts a high-level framework to explain what’s happening under the hood.
Using this guide
This guide is broken out into pages detailing each layer of CockroachDB. We recommend reading through the layers sequentially, starting with this overview and then proceeding to the SQL layer. If you’re looking for a high-level understanding of CockroachDB, you can read the overview section of each layer. For more technical detail, read the components sections as well.This guide details how CockroachDB is built, but does not explain how to build an application using CockroachDB.
Goals of CockroachDB
CockroachDB was designed to meet the following goals:- Make life easier for humans: Be low-touch and highly automated for operators and simple to reason about for developers.
- Industry-leading consistency: Enable distributed transactions and remove the pain of eventual consistency issues and stale reads, even on massively scaled deployments.
- Always-on database: Accept reads and writes on all nodes without generating conflicts.
- Flexible deployment: Deploy in any environment without tying you to any platform or vendor.
- Familiar tools: Support familiar tools for working with relational data using SQL.
How CockroachDB works
CockroachDB starts running on machines with two commands:Start nodes
Run
cockroach start with a --join flag for all of the initial nodes in the cluster, so the process knows all of the other machines it can communicate with.Data distribution and replication
As RPCs start filling your cluster with data, CockroachDB starts algorithmically distributing your data among the nodes of the cluster, breaking the data up into chunks called ranges. Each range is replicated to at least 3 nodes by default to ensure survivability. This ensures that if any nodes go down, you still have copies of the data which can be used for:- Continuing to serve reads and writes
- Consistently replicating the data to other nodes
AS OF SYSTEM TIME clause, letting you find historical data for a period of time.
Architecture layers
At the highest level, CockroachDB converts clients’ SQL statements into key-value (KV) data, which is distributed among nodes and written to disk. CockroachDB’s architecture is manifested as a number of layers, each of which interacts with the layers directly above and below it as relatively opaque services. The following table describes the function each layer performs:| Layer | Order | Purpose |
|---|---|---|
| SQL | 1 | Translate client SQL queries to KV operations |
| Transaction | 2 | Allow atomic changes to multiple KV entries |
| Distribution | 3 | Present replicated KV ranges as a single entity |
| Replication | 4 | Consistently and synchronously replicate KV ranges across many nodes |
| Storage | 5 | Read and write KV data on disk |
Database terms
Key database concepts
Key database concepts
Node: An individual machine running CockroachDB.Cluster: A deployment of CockroachDB that acts as a single logical database.Range: CockroachDB stores all user data and system data in a sorted map of key-value pairs. This keyspace is divided into ranges, contiguous chunks of the key-space.Replica: CockroachDB replicates each range and stores each replica on a different node.Leaseholder: For each range, one of the replicas holds the lease. This replica, referred to as the leaseholder, is the one that receives and coordinates all read and write requests for the range.
CockroachDB architecture terms
Architecture-specific concepts
Architecture-specific concepts
Store: Each node contains at least one store, which is where CockroachDB reads and writes its data on disk.Meta ranges: CockroachDB stores the locations of all ranges in a two-level index at the beginning of the key-space, known as meta ranges.Write intent: A provisional, uncommitted value written to a key that acts as both a lock and a value.Raft: The consensus protocol CockroachDB uses to ensure data consistency across replicas.Transaction record: A record that tracks the status of a transaction (PENDING, STAGING, COMMITTED, or ABORTED).