Design Data Structures for a Social Network

Overview

Design the data structures for a social network that enables users to search for connections and find the shortest path to another person. This problem explores graph algorithms (BFS), distributed systems, sharding strategies, and optimizing friend searches at massive scale.

Step 1: Use Cases and Constraints

Use Cases

In Scope

User searches for someone and sees the shortest path to the searched person
Service has high availability

Constraints and Assumptions

Assumptions:

Traffic is not evenly distributed
- Some searches are very popular, others executed once
Graph data won’t fit on a single machine
Graph edges are unweighted
100 million users
50 friends per user average
1 billion friend searches per month

Exercise traditional systems - don’t use graph-specific solutions like GraphQL or Neo4j for this exercise.

Usage Calculations

Back-of-the-envelope calculations

Data:

5 billion friend relationships
- 100 million users × 50 friends per user average

Throughput:

400 search requests per second

Conversion guide:

2.5 million seconds per month
1 request per second = 2.5 million requests per month
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month

Step 2: High Level Design

Step 3: Core Components

Use Case: User Searches for Connection Path

Without the constraint of millions of users, we could solve this unweighted shortest path problem with standard BFS:

Basic BFS implementation (single machine)

class Graph(Graph):

    def shortest_path(self, source, dest):
        if source is None or dest is None:
            return None
        if source is dest:
            return [source.key]
        prev_node_keys = self._shortest_path(source, dest)
        if prev_node_keys is None:
            return None
        else:
            path_ids = [dest.key]
            prev_node_key = prev_node_keys[dest.key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            return path_ids[::-1]

    def _shortest_path(self, source, dest):
        queue = deque()
        queue.append(source)
        prev_node_keys = {source.key: None}
        source.visit_state = State.visited
        while queue:
            node = queue.popleft()
            if node is dest:
                return prev_node_keys
            prev_node = node
            for adj_node in node.adj_nodes.values():
                if adj_node.visit_state == State.unvisited:
                    queue.append(adj_node)
                    prev_node_keys[adj_node.key] = prev_node.key
                    adj_node.visit_state = State.visited
        return None

Distributed Architecture

Since users won’t fit on one machine, we need to shard across Person Servers with a Lookup Service.

Client sends request

Client sends request to Web Server (reverse proxy)

Web Server routes to Search API

Web Server forwards to Search API server

Search API uses User Graph Service

User Graph Service:

Uses Lookup Service to find Person Server with current user’s info
Retrieves user’s friend_ids list
Runs BFS using current user as source
For each adjacent node, queries Lookup Service to find the Person Server

Implementation Components

LookupService class

class LookupService(object):

    def __init__(self):
        self.lookup = self._init_lookup()  # key: person_id, value: person_server

    def _init_lookup(self):
        ...

    def lookup_person_server(self, person_id):
        return self.lookup[person_id]

PersonServer class

class PersonServer(object):

    def __init__(self):
        self.people = {}  # key: person_id, value: person

    def add_person(self, person):
        ...

    def people(self, ids):
        results = []
        for id in ids:
            if id in self.people:
                results.append(self.people[id])
        return results

Person class

class Person(object):

    def __init__(self, id, name, friend_ids):
        self.id = id
        self.name = name
        self.friend_ids = friend_ids

UserGraphService class

class UserGraphService(object):

    def __init__(self, lookup_service):
        self.lookup_service = lookup_service

    def person(self, person_id):
        person_server = self.lookup_service.lookup_person_server(person_id)
        return person_server.people([person_id])

    def shortest_path(self, source_key, dest_key):
        if source_key is None or dest_key is None:
            return None
        if source_key is dest_key:
            return [source_key]
        prev_node_keys = self._shortest_path(source_key, dest_key)
        if prev_node_keys is None:
            return None
        else:
            # Iterate backwards from dest_key
            path_ids = [dest_key]
            prev_node_key = prev_node_keys[dest_key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            return path_ids[::-1]

    def _shortest_path(self, source_key, dest_key):
        source = self.person(source_key)
        queue = deque()
        queue.append(source)
        prev_node_keys = {source_key: None}
        visited_ids = set()
        visited_ids.add(source.id)
        while queue:
            node = queue.popleft()
            if node.key is dest_key:
                return prev_node_keys
            prev_node = node
            for friend_id in node.friend_ids:
                if friend_id not in visited_ids:
                    friend_node = self.person(friend_id)
                    queue.append(friend_node)
                    prev_node_keys[friend_id] = prev_node.key
                    visited_ids.add(friend_id)
        return None

REST API

curl https://social.com/api/v1/friend_search?person_id=1234

Response:

[
  {
    "person_id": "100",
    "name": "foo",
    "link": "https://social.com/foo"
  },
  {
    "person_id": "53",
    "name": "bar",
    "link": "https://social.com/bar"
  },
  {
    "person_id": "1234",
    "name": "baz",
    "link": "https://social.com/baz"
  }
]

Step 4: Scale the Design

Important: Take an iterative approach:

Benchmark/Load Test
Profile for bottlenecks
Address bottlenecks
Repeat

Scaling Components

DNS

Route users to nearest data center

Load Balancer

Distribute traffic across web servers

Web Servers

Horizontal scaling as reverse proxies

API Servers

Application layer for search logic

Memory Cache

Cache person data (Redis/Memcached) for 400 average reads/second

Reduce response times
Reduce traffic to downstream services
Especially useful for:
- Users doing multiple searches
- Well-connected people

BFS Optimizations

Cache BFS Traversals

Strategy: Store complete or partial BFS traversals in Memory Cache.Benefit: Speed up subsequent lookups.

Batch Compute Offline

Strategy: Pre-compute BFS traversals offline using batch processing.Storage: Store in NoSQL Database for fast retrieval.

Batch Friend Lookups

Strategy: Reduce machine jumps by batching friend lookups on same Person Server.Enhancement: Shard Person Servers by location (friends often live closer).

Bidirectional BFS

Strategy: Run two BFS searches simultaneously:

One from source
One from destination
Merge paths when they meet

Benefit: Potentially reduces search space.

Start from High-Degree Nodes

Strategy: Start BFS from people with large numbers of friends.Benefit: More likely to reduce degrees of separation.

Set Search Limits

Strategy: Limit based on time or number of hops.Implementation: Ask user if they want to continue searching after limit.Reason: Some searches could take considerable time.

Graph Database (if constraints allow)

Options:

Neo4j
GraphQL

Note: Only if there were no constraint preventing graph databases.

Implementation Reference

Python Implementation

View the complete Python implementation including distributed BFS logic.

SQL Scaling Patterns

Read replicas
Federation
Sharding
Denormalization
SQL Tuning

NoSQL Options

Key-value store
Document store
Wide column store
Graph database

Caching Strategies

Cache-aside
Write-through
Write-behind
Refresh ahead

Asynchronous Processing

Message queues
Task queues
Back pressure
Microservices

Key Takeaways

Sharding Person Servers required for 100M users
Lookup Service maps person IDs to servers
BFS algorithm finds shortest path between users
Memory Cache stores person data for fast access
Bidirectional BFS can optimize search performance
Batching friend lookups reduces network hops
Location-based sharding leverages geographic proximity
Pre-computed traversals speed up common searches
Consider Graph Database for native graph operations

Interview Guide

System Design Problems

Object-Oriented Design

Design Data Structures for a Social Network

Overview

Step 1: Use Cases and Constraints

Use Cases

In Scope

Constraints and Assumptions

Usage Calculations

Step 2: High Level Design

Step 3: Core Components

Use Case: User Searches for Connection Path

Distributed Architecture

Implementation Components

REST API

Step 4: Scale the Design

Scaling Components

DNS

Load Balancer

Web Servers

API Servers

Memory Cache

BFS Optimizations

Implementation Reference

Python Implementation

SQL Scaling Patterns

NoSQL Options

Caching Strategies

Asynchronous Processing

Key Takeaways

Build docs developers (and LLMs) love

Interview Guide

System Design Problems

Object-Oriented Design

​Overview

​Step 1: Use Cases and Constraints

​Use Cases

​In Scope

​Constraints and Assumptions

​Usage Calculations

​Step 2: High Level Design

​Step 3: Core Components

​Use Case: User Searches for Connection Path

​Distributed Architecture

​Implementation Components

​REST API

​Step 4: Scale the Design

​Scaling Components

DNS

Load Balancer

Web Servers

API Servers

Memory Cache

​BFS Optimizations

​Implementation Reference

Python Implementation

​Related Topics

SQL Scaling Patterns

NoSQL Options

Caching Strategies

Asynchronous Processing

​Key Takeaways

Build docs developers (and LLMs) love

Overview

Step 1: Use Cases and Constraints

Use Cases

In Scope

Constraints and Assumptions

Usage Calculations

Step 2: High Level Design

Step 3: Core Components

Use Case: User Searches for Connection Path

Distributed Architecture

Implementation Components

REST API

Step 4: Scale the Design

Scaling Components

BFS Optimizations

Implementation Reference

Related Topics

Key Takeaways