Skip to main content
Messaging systems are fundamental to distributed systems, enabling asynchronous communication and data flow between components. At the heart of most distributed systems lies the concept of a log.

Essential Readings

These resources cover the foundational concepts of distributed messaging systems and the log abstraction that unifies many distributed system architectures.

The Log

What every software engineer should know about real-time data’s unifying abstraction

Kafka

A distributed messaging system for log processing

The Log: A Unifying Abstraction

This is a somewhat long read, but covers brilliantly the concept of logs, which are at the heart of most distributed systems.
The log is perhaps the most fundamental data structure in distributed systems. Understanding logs is essential for understanding databases, replication, consensus, and distributed data processing.

What is a Log?

A log is an append-only, totally-ordered sequence of records ordered by time. This simple concept underlies many distributed systems and is the foundation for:

Data Integration

Making all data from all systems available in all places

Real-time Processing

Computing derived data streams and views

Distributed Systems Internals

Replication, consensus, and coordination

Event Sourcing

Recording state changes as a sequence of events

Key Concepts from “The Log”

Database systems use logs extensively:
  • Write-ahead log (WAL): Records all changes before applying them
  • Transaction log: Maintains atomicity and durability
  • Replication log: Keeps replicas in sync
The log is the principal mechanism for data consistency and replication in databases.
In distributed systems, logs serve as:
  • A mechanism for ordering events across different machines
  • The basis for state machine replication
  • A durable queue for asynchronous communication
  • A source of truth that can be replayed
Traditional message queues and the log abstraction serve similar purposes but with different guarantees:
  • Logs provide total ordering
  • Logs are persistent and replayable
  • Multiple consumers can read from the same log independently
  • Logs scale horizontally through partitioning
Read: The Log - What every software engineer should know about real-time data’s unifying abstraction

Apache Kafka

Kafka is a distributed messaging system originally developed at LinkedIn that implements the log abstraction at scale.Design Goals:
  • Persistent messaging with O(1) disk performance
  • High throughput for both publishing and subscribing
  • Explicit support for partitioning messages over Kafka servers
  • Support for distributed consumption of messages
Architecture Highlights:
  • Topics are partitioned and replicated across brokers
  • Producers append to the log
  • Consumers maintain their own offset
  • Designed for high-throughput, low-latency message delivery

Why Kafka Matters

Kafka revolutionized how we think about messaging systems by:
1

Treating messages as a log

Rather than deleting messages after consumption, Kafka retains them for a configurable period
2

Making consumers responsible for their position

Each consumer tracks its own offset in the log, enabling replay and multiple consumption patterns
3

Optimizing for throughput

Batch compression, efficient disk usage, and zero-copy transfer enable extremely high throughput
4

Supporting stream processing

The log abstraction naturally supports both messaging and stream processing workloads
Read Kafka Paper

Use Cases

Event Streaming

Capture and process events in real-time as they occur

Log Aggregation

Collect logs from multiple services into a centralized system

Stream Processing

Transform and analyze data streams in real-time

Commit Log

Use as an external commit log for distributed systems

Data Integration

Synchronize data between different systems and databases

Metrics & Monitoring

Collect and process operational metrics at scale

Learning Path

1

Understand the Log Abstraction

Start with “The Log” article to grasp the fundamental concepts that underpin distributed messaging
2

Study Kafka's Architecture

Read the Kafka paper to see how the log abstraction is implemented at scale
3

Explore Related Systems

Investigate other log-based systems like Apache Pulsar, Amazon Kinesis, and NATS Streaming
4

Apply the Concepts

Implement event-driven architectures using these principles in your own systems
The log abstraction is one of the most powerful unifying concepts in distributed systems. Understanding it deeply will illuminate many other distributed system patterns and architectures.

Build docs developers (and LLMs) love