Design a Key-Value Store for Search Engine

Overview

Design a key-value cache that saves the results of the most recent web server queries for a search engine. This problem explores cache design, LRU (Least Recently Used) eviction policies, distributed caching, and handling billions of queries per month.

Step 1: Use Cases and Constraints

Use Cases

In Scope

User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability

Constraints and Assumptions

Assumptions:

Traffic is not evenly distributed
- Popular queries should almost always be in cache
- Need to determine how to expire/refresh
Serving from cache requires fast lookups
Low latency between machines
Limited memory in cache
- Need to determine what to keep/remove
- Need to cache millions of queries
10 million users
10 billion queries per month

Usage Calculations

Back-of-the-envelope calculations

Cache stores: Ordered list of key: query, value: resultsSize per entry:

query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes

Storage:

2.7 TB of cache data per month if all 10 billion queries are unique
- 270 bytes × 10 billion queries per month
Limited memory requires expiration strategy

Throughput:

4,000 requests per second

Conversion guide:

2.5 million seconds per month
1 request per second = 2.5 million requests per month
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month

Step 2: High Level Design

Step 3: Core Components

Use Case: User Search Results in Cache Hit

Client sends request

Client sends request to Web Server (reverse proxy)

Web Server routes to Query API

Web Server forwards to Query API server

Query API processes search

Query API server:

Parses query (remove markup, tokenize, fix typos, normalize, convert to boolean)
Checks Memory Cache for matching content
If cache hit:
- Updates entry position to front of LRU list
- Returns cached contents
If cache miss:
- Uses Reverse Index Service to find matching documents
- Uses Document Service to return titles and snippets
- Updates Memory Cache with results at front of LRU list

Cache Implementation

The cache uses a doubly-linked list with a hash table for O(1) operations:

New items added to head
Items to expire removed from tail
Hash table provides fast lookups to linked list nodes

QueryApi class

class QueryApi(object):

    def __init__(self, memory_cache, reverse_index_service):
        self.memory_cache = memory_cache
        self.reverse_index_service = reverse_index_service

    def parse_query(self, query):
        """Remove markup, break text into terms, deal with typos,
        normalize capitalization, convert to use boolean operations.
        """
        ...

    def process_query(self, query):
        query = self.parse_query(query)
        results = self.memory_cache.get(query)
        if results is None:
            results = self.reverse_index_service.process_search(query)
            self.memory_cache.set(query, results)
        return results

Node class

class Node(object):

    def __init__(self, query, results):
        self.query = query
        self.results = results

LinkedList class

class LinkedList(object):

    def __init__(self):
        self.head = None
        self.tail = None

    def move_to_front(self, node):
        ...

    def append_to_front(self, node):
        ...

    def remove_from_tail(self):
        ...

Cache class (LRU implementation)

class Cache(object):

    def __init__(self, MAX_SIZE):
        self.MAX_SIZE = MAX_SIZE
        self.size = 0
        self.lookup = {}  # key: query, value: node
        self.linked_list = LinkedList()

    def get(self, query):
        """Get the stored query result from the cache.

        Accessing a node updates its position to the front of the LRU list.
        """
        node = self.lookup[query]
        if node is None:
            return None
        self.linked_list.move_to_front(node)
        return node.results

    def set(self, query, results):
        """Set the result for the given query key in the cache.

        When updating an entry, updates its position to the front of the LRU list.
        If the entry is new and the cache is at capacity, removes the oldest entry
        before the new entry is added.
        """
        node = self.lookup[query]
        if node is not None:
            # Key exists in cache, update the value
            node.results = results
            self.linked_list.move_to_front(node)
        else:
            # Key does not exist in cache
            if self.size == self.MAX_SIZE:
                # Remove the oldest entry from the linked list and lookup
                self.lookup.pop(self.linked_list.tail.query, None)
                self.linked_list.remove_from_tail()
            else:
                self.size += 1
            # Add the new key and value
            new_node = Node(query, results)
            self.linked_list.append_to_front(new_node)
            self.lookup[query] = new_node

When to Update the Cache

Update cache when:

Page contents change
Page is removed or new page added
Page rank changes

TTL (Time To Live): The most straightforward approach is to set a maximum time that cached entries can stay before requiring update.Pattern: This describes the cache-aside pattern.

Step 4: Scale the Design

Important: Take an iterative approach:

Benchmark/Load Test
Profile for bottlenecks
Address bottlenecks
Repeat

Scaling Components

DNS

Route users to nearest data center

Load Balancer

Distribute traffic across web servers

Web Servers

Horizontal scaling as reverse proxies

API Servers

Application layer for query processing

Memory Cache

Distributed caching with sharding strategies

Expanding Memory Cache to Multiple Machines

To handle heavy load and large memory requirements, scale horizontally with three main options:

Option 1: Each machine has its own cache

Pros:

Simple implementation

Cons:

Low cache hit rate
Same query might be cached on multiple machines

Option 2: Each machine has a copy of the cache

Pros:

Simple implementation
Higher hit rate than Option 1

Cons:

Inefficient use of memory
Memory constraints limit scalability

Option 3: Shard the cache across machines (RECOMMENDED)

Pros:

Most efficient use of memory
Highest cache hit rate
Scales well

Implementation:

Use hashing to determine which machine has cached results
machine = hash(query)
Use consistent hashing to minimize disruption when adding/removing machines

Cons:

More complex implementation

Consistent Hashing: Distributes keys across cache servers while minimizing remapping when servers are added or removed.

Implementation Reference

Python Implementation

View the complete Python implementation including LRU cache logic.

SQL Scaling Patterns

Read replicas
Federation
Sharding
Denormalization
SQL Tuning

NoSQL Options

Key-value store
Document store
Wide column store
Graph database

Caching Strategies

Cache-aside
Write-through
Write-behind
Refresh ahead

Asynchronous Processing

Message queues
Task queues
Back pressure
Microservices

Key Takeaways

LRU eviction policy keeps most popular queries cached
Doubly-linked list + hash table provides O(1) operations
Cache-aside pattern updates cache on misses
TTL (Time To Live) handles cache freshness
Consistent hashing enables distributed caching
Sharding across machines most efficient for scale
Handle 4,000 requests/second with distributed architecture
Memory Cache (Redis/Memcached) handles unevenly distributed traffic

Interview Guide

System Design Problems

Object-Oriented Design

Design a Key-Value Store for Search Engine

Overview

Step 1: Use Cases and Constraints

Use Cases

In Scope

Constraints and Assumptions

Usage Calculations

Step 2: High Level Design

Step 3: Core Components

Use Case: User Search Results in Cache Hit

Cache Implementation

When to Update the Cache

Step 4: Scale the Design

Scaling Components

DNS

Load Balancer

Web Servers

API Servers

Memory Cache

Expanding Memory Cache to Multiple Machines

Implementation Reference

Python Implementation

SQL Scaling Patterns

NoSQL Options

Caching Strategies

Asynchronous Processing

Key Takeaways

Build docs developers (and LLMs) love

Interview Guide

System Design Problems

Object-Oriented Design

​Overview

​Step 1: Use Cases and Constraints

​Use Cases

​In Scope

​Constraints and Assumptions

​Usage Calculations

​Step 2: High Level Design

​Step 3: Core Components

​Use Case: User Search Results in Cache Hit

​Cache Implementation

​When to Update the Cache

​Step 4: Scale the Design

​Scaling Components

DNS

Load Balancer

Web Servers

API Servers

Memory Cache

​Expanding Memory Cache to Multiple Machines

​Implementation Reference

Python Implementation

​Related Topics

SQL Scaling Patterns

NoSQL Options

Caching Strategies

Asynchronous Processing

​Key Takeaways

Build docs developers (and LLMs) love

Overview

Step 1: Use Cases and Constraints

Use Cases

In Scope

Constraints and Assumptions

Usage Calculations

Step 2: High Level Design

Step 3: Core Components

Use Case: User Search Results in Cache Hit

Cache Implementation

When to Update the Cache

Step 4: Scale the Design

Scaling Components

Expanding Memory Cache to Multiple Machines

Implementation Reference

Related Topics

Key Takeaways