Skip to main content
These papers are considered quintessential reading for anyone working with distributed systems. They establish foundational concepts that appear throughout distributed systems literature.

Essential Papers

The following papers represent the foundational knowledge every distributed systems engineer should understand.

Time, Clocks and Ordering of Events

Lamport’s quintessential distributed systems primer on logical clocks and event ordering

Session Guarantees for Weakly Consistent Replicated Data

A 1994 paper establishing standard vocabulary for eventually consistent systems

CAP Theorem

The fundamental theorem about trade-offs in distributed systems

FLP Impossibility Result

Proof that consensus is impossible in asynchronous systems with even one faulty process

Lamport’s Time and Clocks

Times, Clocks and Ordering of Events in Distributed Systems This is Leslie Lamport’s seminal paper that establishes the foundations for understanding time and ordering in distributed systems. It introduces the concept of logical clocks and the “happens-before” relationship, which are fundamental to reasoning about distributed computations.
This paper is considered the quintessential distributed systems primer. Nearly all of Lamport’s work is influential, but this particular paper is essential reading because it:
  • Introduces logical clocks as a way to order events without synchronized physical clocks
  • Defines the “happens-before” relationship (→)
  • Provides the foundation for understanding causality in distributed systems
  • Establishes concepts that appear throughout distributed systems literature

Session Guarantees for Weak Consistency

Session Guarantees for Weakly Consistent Replicated Data This 1994 paper discusses various recommendations for session guarantees in eventually consistent systems. It established much of the standard vocabulary used in distributed systems papers today.
The paper introduces several important guarantees that are now standard terminology:
  • Monotonic Reads: If a process reads a value, subsequent reads will never return earlier values
  • Read Your Writes: A process will always see its own writes in subsequent reads
  • Writes Follow Reads: Writes are ordered after reads that causally precede them
  • Monotonic Writes: Writes from a single process are applied in the order they were made

CAP Theorem

CAP Theorem | Plain English Explanation The CAP theorem states that in a distributed system, you can only guarantee two of the following three properties:
  • Consistency: All nodes see the same data at the same time
  • Availability: Every request receives a response
  • Partition Tolerance: The system continues to operate despite network partitions
Understanding CAP is essential before starting work on distributed systems. It explains fundamental trade-offs that affect every architectural decision.

FLP Impossibility

Impossibility of Distributed Consensus with One Faulty Process | Easier Blog Post The FLP Impossibility Result (named after Fischer, Lynch, and Paterson) proves that in an asynchronous distributed system, consensus is impossible if even a single process can fail. This is a fundamental theoretical result that shapes practical distributed system design.
While FLP proves consensus is theoretically impossible in purely asynchronous systems with failures, practical systems work around this by:
  • Adding timeouts (partial synchrony)
  • Using randomization
  • Accepting that consensus might not always be reached
  • Using algorithms like Paxos and Raft that work in practice despite theoretical impossibility

Fallacies of Distributed Computing

Fallacies of Distributed Computing Before diving deep into distributed systems, understand these common false assumptions:
Expect things to break, everything. The fallacies remind us that:
  1. The network is NOT reliable
  2. Latency is NOT zero
  3. Bandwidth is NOT infinite
  4. The network is NOT secure
  5. Topology does NOT stay constant
  6. There is NOT one administrator
  7. Transport cost is NOT zero
  8. The network is NOT homogeneous

Additional Resources

Distributed Systems Theory for the Distributed Engineer

A BFS (breadth-first search) approach to learning distributed systems. Many papers in this guide overlap with other sections.

An Introduction to Distributed Systems

@aphyr’s excellent introduction to distributed systems

Build docs developers (and LLMs) love