Skip to main content

Overview

Chat applications like WhatsApp, Facebook Messenger, and Discord serve billions of messages daily. This case study explores two approaches to building a chat application: a simplified 1-to-1 chat using Redis pub/sub, and a more comprehensive production-grade architecture.
Understanding both simple and complex chat architectures helps you choose the right approach based on scale requirements.

Approach 1: Simple Chat with Redis

Redis-based Chat Application Architecture A simple chat application can leverage Redis pub/sub functionality for real-time messaging.

Stage 1: Connection Initialization

Let’s walk through how Bob connects to the chat application:
1

Client Connection

Steps 1-2: Bob opens the chat application. A WebSocket connection is established between the client and the server for bidirectional, real-time communication.
2

Redis Setup

Steps 3-4: The pub-sub server establishes multiple connections to Redis:
  • One connection to update Redis data models and publish messages to topics
  • Multiple connections to subscribe and listen for updates on different topics
3

Initial Data Load

Steps 5-6: Bob’s client requests:
  • Chat member list (who’s available)
  • Historical message list (previous conversations)
This information is retrieved from Redis and sent to the client.
4

Presence Update

Steps 7-8: Since Bob is a new member joining the chat, a message is published to the member_add topic. Other participants’ clients receive this update and can see Bob is now online.

Stage 2: Message Handling

When Bob sends a message to Alice:
1

Send Message

Step 1: Bob sends a message to Alice through the WebSocket connection.
2

Persist & Publish

Step 2: The server performs two operations:
  1. Adds the message to a Redis SortedSet using ZADD (sorted by timestamp)
  2. Publishes the message to the messages topic for subscribers
3

Receive Message

Step 3: Alice’s client, subscribed to the messages topic, receives the chat message in real-time.
Key Redis Data Structures:
# Store messages in a sorted set (sorted by timestamp)
ZADD chat:room:123 1678901234 '{"user":"Bob","msg":"Hello Alice!"}'

# Publish message to subscribers
PUBLISH messages '{"room":"123","user":"Bob","msg":"Hello Alice!"}'

# Store online members
SADD chat:room:123:members "Bob" "Alice"

Limitations of Redis Approach

While simple, this Redis-based approach has limitations:
  • No message persistence: Redis pub/sub doesn’t guarantee delivery if clients are offline
  • Single point of failure: Redis instance becomes a bottleneck
  • No message history: Pub/sub messages are fire-and-forget
  • Limited scalability: One Redis instance can only handle so many connections

Approach 2: Production-Grade Architecture

Production Chat Application Architecture A scalable chat application for millions of users requires a more sophisticated architecture.

User Login Flow

1

Establish Connection

Step 1: Alice logs in and establishes a WebSocket connection with the server. The connection is stateful and persistent.
2

Update Presence

Steps 2-4:
  • The presence service receives Alice’s connection notification
  • Updates Alice’s status to “online” in the presence database
  • Notifies Alice’s friends about her online status

Messaging Flow

When Alice sends a message to Bob:
1

Send Message

Steps 1-2: Alice sends a chat message to Bob. The message is routed to Chat Service A (the service instance handling Alice’s connection).
2

Generate ID & Persist

Steps 3-4:
  • Message is sent to the Sequencing Service, which generates a globally unique, ordered message ID
  • Message is persisted in the Message Store (database) for durability and history
3

Queue for Delivery

Step 5: The message is sent to the Message Sync Queue to be synchronized to Bob’s chat service.
4

Check Recipient Status

Step 6: The Message Sync Service checks Bob’s presence:If Bob is online:
  • Message is sent to Chat Service B (handling Bob’s connection)
  • Delivered via WebSocket in real-time
If Bob is offline:
  • Message is sent to the Push Notification Server
  • Push notification sent to Bob’s device
5

Deliver to Recipient

Steps 7-8: If Bob is online, the message is pushed to Bob’s client through the WebSocket connection.

Key Components Explained

Purpose: Maintain WebSocket connections with clientsCharacteristics:
  • Each instance handles thousands of concurrent WebSocket connections
  • Routes incoming messages to appropriate services
  • Delivers outgoing messages to connected clients
  • Must be horizontally scalable
Challenges:
  • Session affinity: Same client should reconnect to same instance
  • Connection state management
  • Graceful handling of disconnections
Purpose: Track online/offline status of usersFeatures:
  • Real-time status updates
  • Last seen timestamps
  • User availability (online, away, busy, offline)
  • Heartbeat mechanism to detect disconnections
Implementation:
# Redis example
HSET user:alice:presence status "online" last_seen "1678901234"
EXPIRE user:alice:presence 300  # Auto-expire if no heartbeat
Purpose: Generate globally unique, ordered message IDsWhy needed:
  • Ensure messages appear in correct order across all clients
  • Provide unique identifier for each message
  • Enable efficient message synchronization
Approaches:
  • Snowflake ID: Timestamp + machine ID + sequence number
  • Database sequence: Use database auto-increment (limited scalability)
  • Distributed ID generator: Services like Twitter Snowflake or Instagram’s ID generator
Purpose: Persist all messages for history and recoveryRequirements:
  • High write throughput (millions of messages per second)
  • Fast retrieval of recent messages
  • Long-term storage of message history
  • Support for pagination
Database choices:
  • Cassandra: High write throughput, good for time-series data
  • MongoDB: Flexible schema, good query capabilities
  • HBase: Scalable, column-oriented storage
Purpose: Decouple message sending from deliveryBenefits:
  • Handle bursts of messages
  • Retry failed deliveries
  • Support offline delivery
  • Enable message ordering guarantees
Technologies: Kafka, RabbitMQ, AWS SQS

Design Tradeoffs

WebSocket (Chosen):
  • ✅ True real-time, bidirectional communication
  • ✅ Lower latency
  • ✅ Less bandwidth overhead
  • ❌ More complex to scale (stateful connections)
  • ❌ Requires connection state management
HTTP Polling:
  • ✅ Simple to implement
  • ✅ Stateless, easier to scale
  • ❌ Higher latency
  • ❌ Wasteful (polling empty results)
Push (Chosen):
  • ✅ Instant delivery when recipient is online
  • ✅ Better user experience
  • ❌ Requires maintaining connections
Pull:
  • ✅ Simpler architecture
  • ✅ Client controls polling frequency
  • ❌ Higher latency
  • ❌ Increased server load from constant polling
Asynchronous (Chosen):
  • ✅ Better scalability
  • ✅ Handles traffic spikes
  • ✅ Decouples components
  • ❌ More complex architecture
  • ❌ Eventual consistency
Synchronous:
  • ✅ Simpler to reason about
  • ✅ Immediate consistency
  • ❌ Tight coupling
  • ❌ Harder to scale

Scalability Considerations

Horizontal Scaling

Chat Services

Run multiple instances behind a load balancer. Use consistent hashing for session affinity.

Message Store

Shard by user ID or conversation ID. Use replication for high availability.

Message Queue

Partition by conversation ID. Scale consumers independently of producers.

Presence Service

Use distributed cache (Redis Cluster) for high-speed reads/writes.

Handling Group Chats

Group chats introduce additional complexity: Challenges:
  • Message must be delivered to N recipients (fan-out)
  • Large groups (thousands of members)
  • Read receipts and typing indicators
Solutions:
1

Message Fan-out

When a message is sent to a group:
  1. Persist once in message store
  2. Create N queue entries (one per recipient)
  3. Each recipient’s chat service pulls their messages
2

Optimize Large Groups

For large groups (>100 members):
  • Disable read receipts
  • Batch presence updates
  • Use message pagination aggressively

Key Technologies

WebSocket

Real-time, bidirectional communication between client and server

Redis

Presence service, caching, pub/sub for simple implementations

Cassandra/HBase

Message storage with high write throughput

Kafka

Message queue for async processing and delivery

Snowflake IDs

Globally unique, time-ordered message identifiers

Push Notification

Deliver messages to offline users (APNs, FCM)

Summary

Building a chat application requires careful consideration of:
1

Real-time Communication

Use WebSockets for persistent, bidirectional connections
2

Message Persistence

Store messages durably with globally unique, ordered IDs
3

Presence Management

Track user online/offline status in real-time
4

Async Processing

Use message queues to decouple and scale message delivery
5

Offline Support

Integrate push notifications for offline users
Start simple with Redis pub/sub for prototypes or low-scale applications. As you grow, migrate to a distributed architecture with dedicated services for chat, presence, and message delivery.

Build docs developers (and LLMs) love