These papers are considered quintessential reading for anyone working with distributed systems. They establish foundational concepts that appear throughout distributed systems literature.
Essential Papers
The following papers represent the foundational knowledge every distributed systems engineer should understand.Time, Clocks and Ordering of Events
Lamport’s quintessential distributed systems primer on logical clocks and event ordering
Session Guarantees for Weakly Consistent Replicated Data
A 1994 paper establishing standard vocabulary for eventually consistent systems
CAP Theorem
The fundamental theorem about trade-offs in distributed systems
FLP Impossibility Result
Proof that consensus is impossible in asynchronous systems with even one faulty process
Lamport’s Time and Clocks
Times, Clocks and Ordering of Events in Distributed Systems This is Leslie Lamport’s seminal paper that establishes the foundations for understanding time and ordering in distributed systems. It introduces the concept of logical clocks and the “happens-before” relationship, which are fundamental to reasoning about distributed computations.Why This Paper Matters
Why This Paper Matters
This paper is considered the quintessential distributed systems primer. Nearly all of Lamport’s work is influential, but this particular paper is essential reading because it:
- Introduces logical clocks as a way to order events without synchronized physical clocks
- Defines the “happens-before” relationship (→)
- Provides the foundation for understanding causality in distributed systems
- Establishes concepts that appear throughout distributed systems literature
Session Guarantees for Weak Consistency
Session Guarantees for Weakly Consistent Replicated Data This 1994 paper discusses various recommendations for session guarantees in eventually consistent systems. It established much of the standard vocabulary used in distributed systems papers today.Key Concepts
Key Concepts
The paper introduces several important guarantees that are now standard terminology:
- Monotonic Reads: If a process reads a value, subsequent reads will never return earlier values
- Read Your Writes: A process will always see its own writes in subsequent reads
- Writes Follow Reads: Writes are ordered after reads that causally precede them
- Monotonic Writes: Writes from a single process are applied in the order they were made
CAP Theorem
CAP Theorem | Plain English Explanation The CAP theorem states that in a distributed system, you can only guarantee two of the following three properties:- Consistency: All nodes see the same data at the same time
- Availability: Every request receives a response
- Partition Tolerance: The system continues to operate despite network partitions
FLP Impossibility
Impossibility of Distributed Consensus with One Faulty Process | Easier Blog Post The FLP Impossibility Result (named after Fischer, Lynch, and Paterson) proves that in an asynchronous distributed system, consensus is impossible if even a single process can fail. This is a fundamental theoretical result that shapes practical distributed system design.Practical Implications
Practical Implications
While FLP proves consensus is theoretically impossible in purely asynchronous systems with failures, practical systems work around this by:
- Adding timeouts (partial synchrony)
- Using randomization
- Accepting that consensus might not always be reached
- Using algorithms like Paxos and Raft that work in practice despite theoretical impossibility
Fallacies of Distributed Computing
Fallacies of Distributed Computing Before diving deep into distributed systems, understand these common false assumptions:Additional Resources
Distributed Systems Theory for the Distributed Engineer
A BFS (breadth-first search) approach to learning distributed systems. Many papers in this guide overlap with other sections.
An Introduction to Distributed Systems
@aphyr’s excellent introduction to distributed systems