Recommended reading order
The book is designed to be read sequentially, but you can adapt based on your goals and experience level.Foundation: Chapters 1-4
Build core understanding of data systems fundamentals.Chapter 1: Reliable, Scalable, and Maintainable Applications
- Understand the three pillars of data systems
- Learn about fault tolerance vs. fault prevention
- Study vertical vs. horizontal scaling
- Compare relational, document, and graph models
- Understand when to use each data model
- Learn declarative vs. imperative queries
- Master log-structured vs. update-in-place storage
- Understand B-trees and LSM-trees
- Learn column-oriented storage for analytics
- Study schema evolution techniques
- Compare JSON, Thrift, Protocol Buffers, and Avro
- Understand backward and forward compatibility
Distributed data: Chapters 5-9
Dive deep into distributed systems challenges and solutions.Chapter 5: Replication
- Master leader-based, multi-leader, and leaderless replication
- Understand replication lag and consistency issues
- Study read-after-write and monotonic read guarantees
- Learn key-range vs. hash partitioning
- Understand secondary indexes in partitioned databases
- Study rebalancing strategies
- Master ACID properties
- Understand isolation levels (read committed, repeatable read, serializable)
- Learn about concurrency problems (dirty reads, lost updates)
- Understand partial failures and network issues
- Study unreliable clocks and their implications
- Learn about detecting faults with timeouts
- Master linearizability vs. serializability
- Understand causality and ordering guarantees
- Study consensus algorithms (Paxos, Raft)
Derived data: Chapters 10-12
Learn about processing and integrating data systems.Chapter 10: Batch Processing
- Understand MapReduce and distributed filesystems
- Learn join algorithms in batch processing
- Study dataflow engines beyond MapReduce
- Master event streams and message brokers
- Understand change data capture (CDC)
- Learn stream processing frameworks
- Study data integration patterns
- Understand unbundling databases
- Learn lambda and kappa architectures
Alternative reading paths
For practitioners building systems
Immediate needs
Quick practical path:
- Chapter 1 (overview)
- Chapter 5 (replication basics)
- Chapter 7 (transactions)
- Chapters you need for current project
- Return to fill gaps
Backend engineers
Focus on:
- Chapters 2-3 (storage fundamentals)
- Chapter 5 (replication)
- Chapter 6 (partitioning)
- Chapter 7 (transactions)
- Chapter 11 (streaming for real-time systems)
Data engineers
Focus on:
- Chapter 3 (storage, especially OLAP)
- Chapter 10 (batch processing, MapReduce)
- Chapter 11 (stream processing)
- Chapter 12 (data integration)
Distributed systems engineers
Focus on:
- Chapters 5-9 (all distributed data chapters)
- Pay special attention to:
- Chapter 8 (failure modes)
- Chapter 9 (consensus)
Key concepts to master
Part 1: Foundations of data systems
Chapter 1: Core principles
Chapter 1: Core principles
Critical concepts:
- Reliability: Faults vs. failures, fault tolerance strategies
- Scalability: Load parameters, describing performance with percentiles
- Maintainability: Operability, simplicity, evolvability
Chapter 2: Choosing data models
Chapter 2: Choosing data models
Critical concepts:
- Relational model: Normalized data, joins, ACID transactions
- Document model: Schema flexibility, data locality, embedded documents
- Graph model: Many-to-many relationships, traversals, pattern matching
Chapter 3: Storage engines
Chapter 3: Storage engines
Critical concepts:
- Log-structured storage: LSM-trees, SSTables, compaction
- B-trees: In-place updates, balanced tree, fixed-size pages
- OLTP vs. OLAP: Different workload patterns need different storage
- Column storage: Compression, vectorized processing
Chapter 4: Data encoding
Chapter 4: Data encoding
Critical concepts:
- Schema evolution: Adding/removing fields, changing types
- Compatibility: Backward (new reads old) and forward (old reads new)
- Encoding formats: JSON vs. Thrift vs. Protocol Buffers vs. Avro
Part 2: Distributed data
Chapter 5: Replication fundamentals
Chapter 5: Replication fundamentals
Critical concepts:
- Leader-based replication: Synchronous vs. asynchronous, failover
- Multi-leader replication: Write conflicts, conflict resolution
- Leaderless replication: Quorums, read repair, anti-entropy
- Consistency issues: Read-after-write, monotonic reads, consistent prefix
Chapter 6: Partitioning strategies
Chapter 6: Partitioning strategies
Critical concepts:
- Partitioning by key range: Efficient range queries, risk of hot spots
- Partitioning by hash: Even distribution, no range queries
- Secondary indexes: Document-partitioned (local) vs. term-partitioned (global)
- Rebalancing: Fixed partitions, dynamic partitioning, proportional to nodes
Chapter 7: Transaction isolation
Chapter 7: Transaction isolation
Critical concepts:
- ACID properties: Atomicity, consistency, isolation, durability
- Isolation levels: Read committed, repeatable read, serializable
- Concurrency problems: Dirty reads, dirty writes, lost updates, write skew, phantoms
- Implementing serializability: Actual serial execution, 2PL, SSI
Chapter 8: Distributed system realities
Chapter 8: Distributed system realities
Critical concepts:
- Unreliable networks: Packet loss, delays, partitions
- Unreliable clocks: Clock skew, monotonic vs. time-of-day
- Partial failures: Cannot distinguish crashed vs. slow
- Timeouts and retries: Exponential backoff, idempotency
Chapter 9: Consistency and consensus
Chapter 9: Consistency and consensus
Critical concepts:
- Linearizability: Strongest consistency, appears as single copy
- Causality: Happens-before relationship, causal consistency
- Consensus: Getting nodes to agree, Paxos, Raft, ZAB
- Total order broadcast: Equivalent to consensus
Part 3: Derived data
Chapter 10: Batch processing patterns
Chapter 10: Batch processing patterns
Critical concepts:
- MapReduce: Map phase, shuffle, reduce phase
- Distributed joins: Broadcast join, partitioned join, map-side join
- Dataflow engines: Beyond MapReduce (Spark, Flink)
- Graph processing: Pregel model, bulk synchronous parallel
Chapter 11: Stream processing
Chapter 11: Stream processing
Critical concepts:
- Event streams: Messages vs. events, Kafka, Kinesis
- Change data capture: Streaming database changes
- Event sourcing: Immutable event log, deriving state
- Stream processing: Windowing, joins, fault tolerance
Chapter 12: Integration patterns
Chapter 12: Integration patterns
Critical concepts:
- Unbundling databases: Separate specialized systems
- Dataflow architectures: Event log as integration backbone
- Derived data: System of record vs. derived views
- Lambda vs. Kappa: Batch + stream vs. stream only
Common pitfalls and misconceptions
Practical exercises and projects
Beginner level
Build a key-value store
Learning goals: Storage engines, indexingImplement:
- Hash index with log-structured storage
- Compaction to prevent infinite growth
- Crash recovery
Set up replication
Learning goals: Replication, failoverImplement:
- PostgreSQL primary with 2 replicas
- Streaming replication
- Test failover manually
Compare data models
Learning goals: Data modeling trade-offsModel the same domain in:
- Relational (PostgreSQL)
- Document (MongoDB)
- Graph (Neo4j)
Explore consistency
Learning goals: Replication lag, consistencyExperiment with:
- Read from leader vs. follower
- Measure replication lag
- Observe eventual consistency
Intermediate level
Build a distributed cache
Learning goals: Partitioning, consistent hashingImplement:
- Consistent hashing ring
- Partition assignment
- Handle node additions/removals
Implement MapReduce
Learning goals: Batch processingImplement:
- Simple MapReduce framework
- Word count, join operations
- Fault tolerance
Build event sourcing system
Learning goals: Event logs, derived stateImplement:
- Event store
- State reconstruction from events
- Multiple projections
Transaction isolation levels
Learning goals: Concurrency, isolationDemonstrate:
- Lost updates with read committed
- Write skew with repeatable read
- Fix with serializable isolation
Advanced level
Consensus implementation
Learning goals: Distributed consensusImplement:
- Simplified Raft consensus
- Leader election
- Log replication
Streaming platform
Learning goals: Stream processingBuild:
- CDC from database
- Stream processing pipelines
- Windowed aggregations
Multi-datacenter architecture
Learning goals: Geo-distribution, consistencyDesign:
- Multi-region deployment
- Conflict resolution
- Latency optimization
Data integration platform
Learning goals: System compositionIntegrate:
- OLTP database
- Search index
- Analytics warehouse
- Cache layer
Discussion questions
Use these questions to deepen your understanding:-
Why do we need so many different databases?
- Consider: different data models, workload patterns, CAP trade-offs
-
When is eventual consistency acceptable?
- Think about: user expectations, business requirements, error handling
-
What makes distributed systems hard?
- Examine: partial failures, network unreliability, asynchronous execution
-
How do you choose between batch and stream processing?
- Consider: latency requirements, data volumes, complexity tolerance
-
Is microservices architecture worth the complexity?
- Weigh: team independence, deployment flexibility vs. distributed system challenges
-
How important is backward compatibility?
- Think about: rolling deployments, mobile apps, third-party integrations
Further resources
After completing this book, continue learning with:Academic papers
Read the original research papers referenced throughout the book. Start with:
- Bigtable, Dynamo, Spanner
- Paxos, Raft consensus algorithms
- Dremel (columnar storage)
Open source projects
Study implementations of concepts:
- PostgreSQL (B-trees, MVCC, replication)
- Cassandra (leaderless replication, LSM-trees)
- Kafka (event log, partitioning)
- etcd (Raft consensus)
System design practice
Apply your knowledge:
- Practice system design interviews
- Design real-world systems
- Read architecture blogs (Netflix, Uber, LinkedIn)
Related books
Deepen specific areas:
- “Database Internals” by Alex Petrov
- “Streaming Systems” by Tyler Akidau
- “Designing Distributed Systems” by Brendan Burns
Retention strategies
Active reading
Don’t just read passively. For each chapter:
- Take notes in your own words
- Draw diagrams of concepts
- Explain concepts to a colleague
Hands-on practice
Theory alone isn’t enough:
- Complete practical exercises
- Set up actual systems
- Break things and fix them
Spaced repetition
Review periodically:
- Week 1: Review all chapters
- Month 1: Review key concepts
- Month 3: Review challenging topics
- Month 6: Full review
Quick reference
When to use what
| Use case | Best choice | Why |
|---|---|---|
| Transactional workload | Relational DB | ACID, joins, constraints |
| Hierarchical data | Document DB | Schema flexibility, locality |
| Highly connected data | Graph DB | Relationship traversal |
| High write throughput | LSM-tree storage | Sequential writes |
| Analytics queries | Column-oriented DB | Scan efficiency |
| Strong consistency needed | Single-leader replication | Linearizability |
| Multi-datacenter writes | Multi-leader or leaderless | Availability during partitions |
| Event-driven architecture | Event streaming (Kafka) | Decoupling, scalability |
| Large batch analytics | Hadoop/Spark | High throughput |
| Real-time analytics | Stream processing | Low latency |
Trade-off cheat sheet
| Trade-off | Choose A if… | Choose B if… |
|---|---|---|
| Consistency vs. Availability | Correctness critical (banking) | Uptime critical (social media) |
| Normalization vs. Denormalization | Write-heavy, need consistency | Read-heavy, can tolerate staleness |
| B-tree vs. LSM-tree | Read-heavy workload | Write-heavy workload |
| Batch vs. Stream | Can tolerate hours of latency | Need minute/second latency |
| Vertical vs. Horizontal scaling | Simpler operations | Need unlimited scale |
| Microservices vs. Monolith | Independent team scaling | Simpler operations |