These foundational papers describe production systems that have influenced the state of active research and the design of modern distributed databases.
Core Papers
The following papers are seminal works that have shaped how we build distributed storage and database systems today.Dynamo
Amazon’s highly available key-value store that solved the problem of fault tolerance in an elegant way
Bigtable
Google’s distributed storage system for structured data
Google File System
The foundational distributed file system that powers Google’s infrastructure
Cassandra
A decentralized structured storage system heavily inspired by Dynamo, now open source
Amazon Dynamo
Why Dynamo is Essential Reading
Why Dynamo is Essential Reading
It is very rare for a paper describing an active production system to influence the state of active research in any industry. Dynamo is one of those seminal distributed systems papers that solves the problem of a highly available and fault-tolerant database in an elegant way.Key Contributions:
- Consistent hashing for data partitioning
- Vector clocks for versioning
- Gossip-based membership protocol
- Sloppy quorum and hinted handoff
Google’s Storage Systems
Bigtable: Distributed Storage for Structured Data
Bigtable: Distributed Storage for Structured Data
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.Key Features:
- Sparse, distributed, persistent multi-dimensional sorted map
- Data indexed by row key, column key, and timestamp
- Built on top of GFS
- Automatic data sharding across tablets
Google File System (GFS)
Google File System (GFS)
GFS is a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware.Design Principles:
- Component failures are the norm
- Files are huge by traditional standards
- Most files are mutated by appending new data
- Co-designing applications and file system API benefits the overall system
Open Source Systems
Cassandra
Cassandra is a decentralized structured storage system inspired heavily by Dynamo, combining Dynamo’s partitioning and replication techniques with Bigtable’s data model.
- Peer-to-peer architecture (no single point of failure)
- Elastic scalability
- Tunable consistency
- Column-family data model
- Eventual consistency with tunable trade-offs
CRUSH & Ceph
CRUSH: Controlled, Scalable, Decentralized Placement
CRUSH: Controlled, Scalable, Decentralized Placement
CRUSH (Controlled Replication Under Scalable Hashing) is a scalable pseudo-random data distribution function designed for distributed object-based storage systems.Purpose:
- Basis for the Ceph distributed storage system
- Deterministic placement of replicated data
- Minimal data migration when cluster changes
- CRUSH Paper
- RADOS Architecture - The underlying architecture of Ceph
Learning Path
Start with Dynamo
Understand the principles of highly available key-value stores and consistent hashing
Study Google's Systems
Read both GFS and Bigtable to understand how Google built scalable storage infrastructure
Explore Cassandra
See how Dynamo’s principles were combined with Bigtable’s data model in an open-source system