Skip to main content

Overview

Design the data structures for a social network that enables users to search for connections and find the shortest path to another person. This problem explores graph algorithms (BFS), distributed systems, sharding strategies, and optimizing friend searches at massive scale.

Step 1: Use Cases and Constraints

Use Cases

In Scope

  • User searches for someone and sees the shortest path to the searched person
  • Service has high availability

Constraints and Assumptions

Assumptions:
  • Traffic is not evenly distributed
    • Some searches are very popular, others executed once
  • Graph data won’t fit on a single machine
  • Graph edges are unweighted
  • 100 million users
  • 50 friends per user average
  • 1 billion friend searches per month
Exercise traditional systems - don’t use graph-specific solutions like GraphQL or Neo4j for this exercise.

Usage Calculations

Data:
  • 5 billion friend relationships
    • 100 million users × 50 friends per user average
Throughput:
  • 400 search requests per second
Conversion guide:
  • 2.5 million seconds per month
  • 1 request per second = 2.5 million requests per month
  • 40 requests per second = 100 million requests per month
  • 400 requests per second = 1 billion requests per month

Step 2: High Level Design

Social Network High Level Design

Step 3: Core Components

Use Case: User Searches for Connection Path

Without the constraint of millions of users, we could solve this unweighted shortest path problem with standard BFS:
class Graph(Graph):

    def shortest_path(self, source, dest):
        if source is None or dest is None:
            return None
        if source is dest:
            return [source.key]
        prev_node_keys = self._shortest_path(source, dest)
        if prev_node_keys is None:
            return None
        else:
            path_ids = [dest.key]
            prev_node_key = prev_node_keys[dest.key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            return path_ids[::-1]

    def _shortest_path(self, source, dest):
        queue = deque()
        queue.append(source)
        prev_node_keys = {source.key: None}
        source.visit_state = State.visited
        while queue:
            node = queue.popleft()
            if node is dest:
                return prev_node_keys
            prev_node = node
            for adj_node in node.adj_nodes.values():
                if adj_node.visit_state == State.unvisited:
                    queue.append(adj_node)
                    prev_node_keys[adj_node.key] = prev_node.key
                    adj_node.visit_state = State.visited
        return None

Distributed Architecture

Since users won’t fit on one machine, we need to shard across Person Servers with a Lookup Service.
1

Client sends request

Client sends request to Web Server (reverse proxy)
2

Web Server routes to Search API

Web Server forwards to Search API server
3

Search API uses User Graph Service

User Graph Service:
  • Uses Lookup Service to find Person Server with current user’s info
  • Retrieves user’s friend_ids list
  • Runs BFS using current user as source
  • For each adjacent node, queries Lookup Service to find the Person Server

Implementation Components

class LookupService(object):

    def __init__(self):
        self.lookup = self._init_lookup()  # key: person_id, value: person_server

    def _init_lookup(self):
        ...

    def lookup_person_server(self, person_id):
        return self.lookup[person_id]
class PersonServer(object):

    def __init__(self):
        self.people = {}  # key: person_id, value: person

    def add_person(self, person):
        ...

    def people(self, ids):
        results = []
        for id in ids:
            if id in self.people:
                results.append(self.people[id])
        return results
class Person(object):

    def __init__(self, id, name, friend_ids):
        self.id = id
        self.name = name
        self.friend_ids = friend_ids
class UserGraphService(object):

    def __init__(self, lookup_service):
        self.lookup_service = lookup_service

    def person(self, person_id):
        person_server = self.lookup_service.lookup_person_server(person_id)
        return person_server.people([person_id])

    def shortest_path(self, source_key, dest_key):
        if source_key is None or dest_key is None:
            return None
        if source_key is dest_key:
            return [source_key]
        prev_node_keys = self._shortest_path(source_key, dest_key)
        if prev_node_keys is None:
            return None
        else:
            # Iterate backwards from dest_key
            path_ids = [dest_key]
            prev_node_key = prev_node_keys[dest_key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            return path_ids[::-1]

    def _shortest_path(self, source_key, dest_key):
        source = self.person(source_key)
        queue = deque()
        queue.append(source)
        prev_node_keys = {source_key: None}
        visited_ids = set()
        visited_ids.add(source.id)
        while queue:
            node = queue.popleft()
            if node.key is dest_key:
                return prev_node_keys
            prev_node = node
            for friend_id in node.friend_ids:
                if friend_id not in visited_ids:
                    friend_node = self.person(friend_id)
                    queue.append(friend_node)
                    prev_node_keys[friend_id] = prev_node.key
                    visited_ids.add(friend_id)
        return None

REST API

curl https://social.com/api/v1/friend_search?person_id=1234
Response:
[
  {
    "person_id": "100",
    "name": "foo",
    "link": "https://social.com/foo"
  },
  {
    "person_id": "53",
    "name": "bar",
    "link": "https://social.com/bar"
  },
  {
    "person_id": "1234",
    "name": "baz",
    "link": "https://social.com/baz"
  }
]

Step 4: Scale the Design

Social Network Scaled Design
Important: Take an iterative approach:
  1. Benchmark/Load Test
  2. Profile for bottlenecks
  3. Address bottlenecks
  4. Repeat

Scaling Components

DNS

Route users to nearest data center

Load Balancer

Distribute traffic across web servers

Web Servers

Horizontal scaling as reverse proxies

API Servers

Application layer for search logic

Memory Cache

Cache person data (Redis/Memcached) for 400 average reads/second
  • Reduce response times
  • Reduce traffic to downstream services
  • Especially useful for:
    • Users doing multiple searches
    • Well-connected people

BFS Optimizations

Strategy: Store complete or partial BFS traversals in Memory Cache.Benefit: Speed up subsequent lookups.
Strategy: Pre-compute BFS traversals offline using batch processing.Storage: Store in NoSQL Database for fast retrieval.
Strategy: Reduce machine jumps by batching friend lookups on same Person Server.Enhancement: Shard Person Servers by location (friends often live closer).
Strategy: Run two BFS searches simultaneously:
  • One from source
  • One from destination
  • Merge paths when they meet
Benefit: Potentially reduces search space.
Strategy: Start BFS from people with large numbers of friends.Benefit: More likely to reduce degrees of separation.
Strategy: Limit based on time or number of hops.Implementation: Ask user if they want to continue searching after limit.Reason: Some searches could take considerable time.
Options:
  • Neo4j
  • GraphQL
Note: Only if there were no constraint preventing graph databases.

Implementation Reference

Python Implementation

View the complete Python implementation including distributed BFS logic.

SQL Scaling Patterns

  • Read replicas
  • Federation
  • Sharding
  • Denormalization
  • SQL Tuning

NoSQL Options

  • Key-value store
  • Document store
  • Wide column store
  • Graph database

Caching Strategies

  • Cache-aside
  • Write-through
  • Write-behind
  • Refresh ahead

Asynchronous Processing

  • Message queues
  • Task queues
  • Back pressure
  • Microservices

Key Takeaways

  • Sharding Person Servers required for 100M users
  • Lookup Service maps person IDs to servers
  • BFS algorithm finds shortest path between users
  • Memory Cache stores person data for fast access
  • Bidirectional BFS can optimize search performance
  • Batching friend lookups reduces network hops
  • Location-based sharding leverages geographic proximity
  • Pre-computed traversals speed up common searches
  • Consider Graph Database for native graph operations

Build docs developers (and LLMs) love