Skip to main content
This glossary contains essential terms and concepts from all chapters of Designing Data-Intensive Applications. Terms are organized alphabetically with clear definitions and chapter references.
Atomicity, Consistency, Isolation, Durability - Four properties that guarantee database transactions are processed reliably. Atomicity ensures all-or-nothing execution, Consistency maintains data integrity, Isolation prevents concurrent transaction interference, and Durability ensures committed data persists.Chapter: 7 - Transactions
A sequence of data processing stages in document databases (like MongoDB) where each stage transforms the data. More declarative than MapReduce while providing similar functionality for complex queries.Chapter: 2 - Data Models and Query Languages
A background process in leaderless replication that continuously looks for differences between replicas and copies missing data. Runs without particular order and may have significant lag.Chapter: 5 - Replication
Replication where the leader sends changes to followers but doesn’t wait for confirmation before acknowledging writes to clients. Provides better performance but risks data loss if the leader fails.Chapter: 5 - Replication
A transaction property ensuring operations are all-or-nothing - either all changes commit successfully or none do. Failed transactions are rolled back completely.Chapter: 7 - Transactions
The ability of a system to remain operational and respond to requests even when some components fail. Often traded against consistency in distributed systems (CAP theorem).Chapter: 9 - Consistency and Consensus
A binary encoding format with schema evolution support. Unlike Thrift/Protocol Buffers, uses writer’s schema and reader’s schema together for decoding, allowing more flexible schema evolution.Chapter: 4 - Encoding and Evolution
The ability of new code to read data written by old code. Essential for rolling deployments where old and new versions coexist.Chapter: 4 - Encoding and Evolution
Processing large amounts of data in a single job that runs periodically. Measured by throughput rather than response time. Examples: MapReduce, Apache Spark.Chapter: 10 - Batch Processing
A space-efficient probabilistic data structure used to test set membership. Can definitively say an element is NOT in a set, or that it MIGHT be (with small false positive rate). Used in LSM-trees to avoid reading SSTables unnecessarily.Chapter: 3 - Storage and Retrieval
The most common indexing structure in databases. Self-balancing tree data structure that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. Updates data in place with fixed-size pages.Chapter: 3 - Storage and Retrieval
States that in the presence of network partitions, a distributed system must choose between Consistency (linearizability) and Availability. In practice, most systems choose availability and eventual consistency.Chapter: 9 - Consistency and Consensus
The happens-before relationship between events. If event A causes event B, then A must be processed before B. Maintaining causal order is weaker than linearizability but important for correctness.Chapter: 9 - Consistency and Consensus
The process of observing changes written to a database and replicating them to other systems. Enables keeping derived data systems (caches, search indexes) up-to-date.Chapter: 11 - Stream Processing
Storage format where values from the same column are stored together, rather than rows. Excellent for analytics workloads that scan many rows but few columns. Enables better compression and vectorized processing.Chapter: 3 - Storage and Retrieval
The process of combining multiple smaller files or log segments into larger files, removing deleted and overwritten values. Keeps disk usage under control in log-structured storage.Chapter: 3 - Storage and Retrieval
Agreement among distributed nodes on a single value or course of action. Fundamental problem in distributed systems. Algorithms: Paxos, Raft, ZAB.Chapter: 9 - Consistency and Consensus
A partitioning scheme that minimizes data movement when nodes are added or removed. Keys and nodes are placed on a hash ring, with each key belonging to the next node clockwise.Chapter: 6 - Partitioning
Guarantee that if a sequence of writes happens in a certain order, anyone reading those writes will see them in the same order. Prevents causality violations like seeing an answer before a question.Chapter: 5 - Replication
Pattern that separates read and write operations into different models. Commands update state (write model), queries read from optimized read models derived from events.Chapter: 12 - The Future of Data Systems
Declarative query language for graph databases (Neo4j). Makes it easy to express graph traversal patterns and relationship queries.Chapter: 2 - Data Models and Query Languages
A grid of aggregates grouped by different dimensions. Special kind of materialized view that enables extremely fast analytical queries but is less flexible.Chapter: 3 - Storage and Retrieval
A separate database optimized for analytics (OLAP). Data is extracted from OLTP systems via ETL, transformed into analytics-friendly schema (often star schema).Chapter: 3 - Storage and Retrieval
Query that specifies what result is desired, not how to achieve it. Allows database to choose optimal execution plan. Examples: SQL, CSS. Contrast with imperative.Chapter: 2 - Data Models and Query Languages
Data that can be recreated from another source (system of record). Examples: caches, denormalized data, indexes. Can be rebuilt if lost.Chapter: 12 - The Future of Data Systems
Database that stores data as self-contained documents (usually JSON). Schema-flexible, good for hierarchical data. Examples: MongoDB, CouchDB.Chapter: 2 - Data Models and Query Languages
The guarantee that once a transaction commits, its changes persist even if hardware fails or database crashes. Typically achieved with write-ahead logs and replication.Chapter: 7 - Transactions
Weak consistency guarantee where replicas may temporarily diverge but eventually converge to the same value if no new updates arrive. Trades consistency for availability and performance.Chapter: 5 - Replication
Storing all changes as an immutable sequence of events rather than mutable current state. Enables audit trail, time travel, and debugging. Current state derived by replaying events.Chapter: 11 - Stream Processing
Process of automatically promoting a follower to leader when the leader fails. Challenging due to potential data loss, split-brain scenarios, and choosing the right timeout.Chapter: 5 - Replication
The ability of old code to read data written by new code. Harder to achieve than backward compatibility but necessary for rolling deployments.Chapter: 4 - Encoding and Evolution
Database optimized for data with many relationships. Stores nodes (entities) and edges (relationships). Excellent for traversing connections. Examples: Neo4j, Amazon Neptune.Chapter: 2 - Data Models and Query Languages
Index that uses a hash table in memory to map keys to byte offsets in a data file. Very fast lookups but doesn’t support range queries.Chapter: 3 - Storage and Retrieval
Distributed filesystem designed for storing large files across many machines. Provides fault tolerance through replication and enables data locality for MapReduce jobs.Chapter: 10 - Batch Processing
A partition or node that receives disproportionately high load compared to others. Causes uneven distribution and performance problems. Example: celebrity user problem.Chapter: 6 - Partitioning
Property where performing an operation multiple times has the same effect as performing it once. Critical for safe retries in distributed systems.Chapter: 11 - Stream Processing
Query that specifies step-by-step how to achieve a result. Gives explicit control but harder to optimize. Contrast with declarative.Chapter: 2 - Data Models and Query Languages
Transaction property ensuring concurrent transactions don’t interfere with each other. Different isolation levels (read committed, repeatable read, serializable) offer different guarantees.Chapter: 7 - Transactions
Combining data from multiple tables or collections based on a related field. In distributed systems, challenging due to data partitioning. Types: broadcast join, partitioned join.Chapter: 10 - Batch Processing
Log-based message broker for event streaming. Provides high throughput, message retention, and partitioning. Core component of many streaming architectures.Chapter: 11 - Stream Processing
Data processing architecture with separate batch and stream processing layers. Batch layer provides accurate complete views, speed layer provides low-latency approximate views.Chapter: 12 - The Future of Data Systems
Replication scheme where one node (leader) accepts writes and propagates changes to followers. Most common replication approach. Also called master-slave or active-passive.Chapter: 5 - Replication
Replication where clients send writes to multiple replicas directly, with no leader. Uses quorums for consistency. Examples: Dynamo, Cassandra, Riak.Chapter: 5 - Replication
Strongest consistency guarantee. Makes a system appear as if there’s only one copy of data and all operations happen atomically at a single point in time. Also called strong consistency.Chapter: 9 - Consistency and Consensus
Storage approach that only appends to files, never updates in place. Examples: LSM-trees, SSTables. Enables high write throughput.Chapter: 3 - Storage and Retrieval
Concurrency problem where two transactions read the same value, modify it, and write it back, with one overwriting the other’s changes.Chapter: 7 - Transactions
Storage structure that maintains multiple sorted files (SSTables) and periodically merges them. Optimized for write-heavy workloads. Used by Cassandra, LevelDB, RocksDB.Chapter: 3 - Storage and Retrieval
How easy it is for others to work on the system over time. Comprises operability (keeping system running), simplicity (managing complexity), and evolvability (adapting to change).Chapter: 1 - Reliable, Scalable, and Maintainable Applications
Programming model for batch processing large datasets in parallel across many machines. Consists of map phase (process records, emit key-value pairs) and reduce phase (aggregate values per key).Chapter: 10 - Batch Processing
A cached query result that’s stored and maintained up-to-date as underlying data changes. Faster to read than recomputing but requires updates on writes.Chapter: 3 - Storage and Retrieval, 11 - Stream Processing
Guarantee that if a user reads data at one point in time, subsequent reads won’t see older data. Prevents time from moving backward for that user.Chapter: 5 - Replication
Replication with multiple nodes accepting writes. Each leader replicates changes to other leaders. Good for multi-datacenter setups but requires conflict resolution.Chapter: 5 - Replication
Situation where network failure splits nodes into groups that can’t communicate. Forces choice between consistency and availability (CAP theorem).Chapter: 8 - The Trouble with Distributed Systems
Workload pattern optimized for analytics: large scans, aggregations, complex queries on historical data. Column-oriented storage is ideal.Chapter: 3 - Storage and Retrieval
Workload pattern optimized for transactions: small queries, random access by key, real-time user interactions. Row-oriented storage is typical.Chapter: 3 - Storage and Retrieval
Situation in distributed systems where some components fail while others continue working. Makes distributed systems fundamentally different from single-machine systems.Chapter: 8 - The Trouble with Distributed Systems
A subset of data in a distributed database. Enables horizontal scaling by distributing data across multiple machines. Also called shard, region, or vnode.Chapter: 6 - Partitioning
Statistical measure showing the value below which a given percentage of observations fall. p99 means 99% of requests are faster than this value. Better than averages for understanding tail latency.Chapter: 1 - Reliable, Scalable, and Maintainable Applications
Binary encoding format developed by Google requiring a schema. Uses tag numbers for fields, enabling schema evolution. More compact than JSON.Chapter: 4 - Encoding and Evolution
Minimum number of nodes that must participate in an operation for it to succeed. In quorum reads/writes: w + r > n ensures reading latest value (w=write quorum, r=read quorum, n=replicas).Chapter: 5 - Replication
Consensus algorithm designed to be more understandable than Paxos. Used in etcd, Consul. Provides leader election and replicated log.Chapter: 9 - Consistency and Consensus
Technique in leaderless replication where a client detects stale values during reads and writes back the latest value to out-of-date replicas.Chapter: 5 - Replication
Guarantee that after a user writes data, they will always see that data on subsequent reads. Also called read-your-writes consistency.Chapter: 5 - Replication
Process of moving data from one node to another when nodes are added or removed, or when load becomes unbalanced. Should minimize data movement and allow continued operation.Chapter: 6 - Partitioning
System’s ability to continue working correctly even when things go wrong (hardware faults, software errors, human mistakes). Achieved through redundancy and fault tolerance.Chapter: 1 - Reliable, Scalable, and Maintainable Applications
Keeping copies of the same data on multiple machines for redundancy and performance. Fundamental technique in distributed data systems.Chapter: 5 - Replication
Delay between a write on the leader and when that write becomes visible on a follower. Can cause various consistency issues in asynchronous replication.Chapter: 5 - Replication
Communication pattern that tries to make remote function calls look like local ones. Examples: gRPC, Thrift. Hides network complexity but can be problematic.Chapter: 4 - Encoding and Evolution
System’s ability to cope with increased load by adding resources. Can scale vertically (more powerful machine) or horizontally (more machines).Chapter: 1 - Reliable, Scalable, and Maintainable Applications
The process of changing database schemas over time while maintaining compatibility with existing data and code. Critical for zero-downtime deployments.Chapter: 4 - Encoding and Evolution
Approach where data structure is interpreted when reading (document databases). Flexible but requires application to handle variations. Contrast with schema-on-write.Chapter: 2 - Data Models and Query Languages
Approach where database enforces schema when writing (relational databases). Provides guarantees but requires migrations for changes.Chapter: 2 - Data Models and Query Languages
Additional index structure beyond the primary key, enabling efficient queries on non-key columns. In distributed systems, can be local (per partition) or global.Chapter: 6 - Partitioning
Strongest isolation level for transactions. Result is the same as if transactions executed serially (one at a time), even though they may actually run concurrently.Chapter: 7 - Transactions
Uneven distribution of data or load across partitions. Hot spots occur when one partition has much more load than others.Chapter: 6 - Partitioning
Dangerous failure scenario where multiple nodes believe they are the leader simultaneously. Can cause data divergence and corruption. Prevented with fencing or quorums.Chapter: 5 - Replication
File format where key-value pairs are sorted by key. Enables efficient merging and range queries. Used in LSM-trees.Chapter: 3 - Storage and Retrieval
Common data warehouse schema with central fact table surrounded by dimension tables. Optimized for analytics queries.Chapter: 3 - Storage and Retrieval
Continuous processing of unbounded data streams. Lower latency than batch processing. Used for real-time analytics, monitoring, alerting.Chapter: 11 - Stream Processing
Replication where the leader waits for acknowledgment from followers before confirming writes to clients. Guarantees durability but reduces availability.Chapter: 5 - Replication
Binary encoding format developed by Facebook requiring a schema. Similar to Protocol Buffers with different encoding formats available.Chapter: 4 - Encoding and Evolution
Maximum time to wait for a response before considering an operation failed. Critical for detecting failures in distributed systems but hard to choose correctly.Chapter: 8 - The Trouble with Distributed Systems
Protocol ensuring all messages are delivered to all nodes in the same order. Equivalent to consensus. Used for state machine replication.Chapter: 9 - Consistency and Consensus
Way to group multiple reads and writes into a logical unit that either completely succeeds (commits) or fails (aborts). Simplifies error handling.Chapter: 7 - Transactions
Algorithm for atomic commit across multiple nodes. Uses coordinator to ensure all participants either commit or abort together. Blocking if coordinator fails.Chapter: 9 - Consistency and Consensus
Data structure for tracking causality in distributed systems. Each node maintains counters for all nodes. Enables detecting concurrent vs. causally-ordered operations.Chapter: 9 - Consistency and Consensus
Append-only log of all database writes. Written before applying changes to data structures. Enables crash recovery and replication.Chapter: 3 - Storage and Retrieval
Coordination service for distributed systems. Provides consensus, leader election, and configuration management. Uses ZAB consensus algorithm.Chapter: 9 - Consistency and Consensus

Build docs developers (and LLMs) love