Messaging Systems

Messaging systems are fundamental to distributed systems, enabling asynchronous communication and data flow between components. At the heart of most distributed systems lies the concept of a log.

Essential Readings

These resources cover the foundational concepts of distributed messaging systems and the log abstraction that unifies many distributed system architectures.

The Log

What every software engineer should know about real-time data’s unifying abstraction

Kafka

A distributed messaging system for log processing

The Log: A Unifying Abstraction

This is a somewhat long read, but covers brilliantly the concept of logs, which are at the heart of most distributed systems.

The log is perhaps the most fundamental data structure in distributed systems. Understanding logs is essential for understanding databases, replication, consensus, and distributed data processing.

What is a Log?

A log is an append-only, totally-ordered sequence of records ordered by time. This simple concept underlies many distributed systems and is the foundation for:

Data Integration

Making all data from all systems available in all places

Real-time Processing

Computing derived data streams and views

Distributed Systems Internals

Replication, consensus, and coordination

Event Sourcing

Recording state changes as a sequence of events

Key Concepts from “The Log”

Logs in Databases

Database systems use logs extensively:

Write-ahead log (WAL): Records all changes before applying them
Transaction log: Maintains atomicity and durability
Replication log: Keeps replicas in sync

The log is the principal mechanism for data consistency and replication in databases.

Logs in Distributed Systems

In distributed systems, logs serve as:

A mechanism for ordering events across different machines
The basis for state machine replication
A durable queue for asynchronous communication
A source of truth that can be replayed

The Log as a Message Queue

Traditional message queues and the log abstraction serve similar purposes but with different guarantees:

Logs provide total ordering
Logs are persistent and replayable
Multiple consumers can read from the same log independently
Logs scale horizontally through partitioning

Read: The Log - What every software engineer should know about real-time data’s unifying abstraction

Apache Kafka

Kafka: Distributed Messaging for Log Processing

Kafka is a distributed messaging system originally developed at LinkedIn that implements the log abstraction at scale.Design Goals:

Persistent messaging with O(1) disk performance
High throughput for both publishing and subscribing
Explicit support for partitioning messages over Kafka servers
Support for distributed consumption of messages

Architecture Highlights:

Topics are partitioned and replicated across brokers
Producers append to the log
Consumers maintain their own offset
Designed for high-throughput, low-latency message delivery

Why Kafka Matters

Kafka revolutionized how we think about messaging systems by:

Treating messages as a log

Rather than deleting messages after consumption, Kafka retains them for a configurable period

Making consumers responsible for their position

Each consumer tracks its own offset in the log, enabling replay and multiple consumption patterns

Optimizing for throughput

Batch compression, efficient disk usage, and zero-copy transfer enable extremely high throughput

Supporting stream processing

The log abstraction naturally supports both messaging and stream processing workloads

Read Kafka Paper

Use Cases

Event Streaming

Capture and process events in real-time as they occur

Log Aggregation

Collect logs from multiple services into a centralized system

Stream Processing

Transform and analyze data streams in real-time

Commit Log

Use as an external commit log for distributed systems

Data Integration

Synchronize data between different systems and databases

Metrics & Monitoring

Collect and process operational metrics at scale

Learning Path

Understand the Log Abstraction

Start with “The Log” article to grasp the fundamental concepts that underpin distributed messaging

Study Kafka's Architecture

Read the Kafka paper to see how the log abstraction is implemented at scale

Explore Related Systems

Investigate other log-based systems like Apache Pulsar, Amazon Kinesis, and NATS Streaming

Apply the Concepts

Implement event-driven architectures using these principles in your own systems

The log abstraction is one of the most powerful unifying concepts in distributed systems. Understanding it deeply will illuminate many other distributed system patterns and architectures.

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

Essential Readings

The Log

Kafka

The Log: A Unifying Abstraction

What is a Log?

Data Integration

Real-time Processing

Distributed Systems Internals

Event Sourcing

Key Concepts from “The Log”

Apache Kafka

Why Kafka Matters

Use Cases

Event Streaming

Log Aggregation

Stream Processing

Commit Log

Data Integration

Metrics & Monitoring

Learning Path

Build docs developers (and LLMs) love

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

​Essential Readings

The Log

Kafka

​The Log: A Unifying Abstraction

​What is a Log?

Data Integration

Real-time Processing

Distributed Systems Internals

Event Sourcing

​Key Concepts from “The Log”

​Apache Kafka

​Why Kafka Matters

​Use Cases

Event Streaming

Log Aggregation

Stream Processing

Commit Log

Data Integration

Metrics & Monitoring

​Learning Path

Build docs developers (and LLMs) love

Essential Readings

The Log: A Unifying Abstraction

What is a Log?

Key Concepts from “The Log”

Apache Kafka

Why Kafka Matters

Use Cases

Learning Path