Why Algorithms Matter in System Design
While system design interviews focus on architecture rather than coding, understanding key algorithms is crucial. These algorithms form the foundation of the systems you’ll design.In system design interviews, understanding how algorithms are used in real-world systems is more important than implementation details. Focus on when and why to use each algorithm.
Algorithm Importance Ratings
Algorithms are rated by importance for system design interviews:- ⭐⭐⭐⭐⭐ Five Stars: Critical - Must understand how it works and why
- ⭐⭐⭐ Three Stars: Important - Should know use cases and tradeoffs
- ⭐ One Star: Advanced - Good to know for senior positions
Data Structure Algorithms
Hash Tables ⭐⭐⭐⭐⭐
What it is: Key-value data structure with O(1) average lookup time. Real-world uses:- Database indexing: Fast record lookups
- Caching: Redis, Memcached use hash tables
- Session storage: Quick user session retrieval
- Deduplication: Check for duplicate entries
- Rate limiting: Track request counts per user
- Fast lookups but no ordering
- Potential hash collisions
- Memory overhead for hash function
Bloom Filters ⭐⭐⭐
What it is: Probabilistic data structure for membership testing. Real-world uses:- Web crawlers: Check if URL already visited
- Databases: Avoid expensive disk lookups
- CDN: Quick check if content is cached
- Email spam filters: Fast spam detection
- Can have false positives (might say “yes” when answer is “no”)
- Never false negatives (if it says “no”, definitely “no”)
- Space efficient compared to hash sets
- Cannot delete elements (use counting Bloom filter variant)
Trees (B-Trees, B+ Trees, LSM Trees) ⭐⭐⭐⭐⭐
What they are: Hierarchical data structures optimized for disk-based storage. B-Trees & B+ Trees:- Used in databases and file systems
- Keep data sorted for efficient range queries
- Minimize disk reads through high branching factor
- Optimized for write-heavy workloads
- Used in Cassandra, RocksDB, LevelDB
- Trade read performance for write throughput
- Choose B+ trees for read-heavy systems
- Choose LSM trees for write-heavy systems
- Consider for time-series data storage
Geospatial Algorithms
Geohash ⭐⭐⭐⭐⭐
What it is: Geocoding system that encodes geographic coordinates into short strings. How it works:- Divide world into grid
- Each subdivision adds a character to the hash
- Nearby locations share common prefixes
- Longer hash = more precise location
- Uber/Lyft: Match riders with nearby drivers
- Yelp: Find restaurants near you
- Google Maps: Spatial indexing
- Pokemon Go: Track player locations
- Simple string comparison for proximity
- Easy to store in databases
- Hierarchical (prefix matching)
Quadtree ⭐⭐⭐⭐
What it is: Tree structure that recursively divides 2D space into quadrants. Real-world uses:- Image processing: Compression and collision detection
- Map rendering: Decide which detail level to show
- Spatial databases: Efficiently query nearby points
- Start with bounding box of entire area
- If too many points, divide into 4 quadrants
- Recursively subdivide until threshold met
- Query by traversing relevant quadrants only
R-Tree ⭐⭐⭐
What it is: Tree structure for indexing multi-dimensional spatial data. Real-world uses:- GIS systems: Store polygons, routes, boundaries
- CAD software: Spatial queries on designs
- Databases: PostGIS spatial extension
String & Search Algorithms
Trie (Prefix Tree) ⭐⭐⭐⭐⭐
What it is: Tree structure for storing strings with common prefixes. Real-world uses:- Autocomplete: Search suggestions as you type
- Spell checkers: Dictionary lookup
- IP routing: Longest prefix matching
- Search engines: Query suggestions
- O(k) lookup time where k = string length
- Space-intensive but very fast
- Efficient for prefix queries
Suffix Tree/Array ⭐⭐
What it is: Data structure for string pattern matching. Real-world uses:- DNA sequencing: Pattern matching in genomics
- Text editors: Find/replace operations
- Plagiarism detection: Compare documents
- Full-text search in documents
- Code search in large codebases
- Log analysis and pattern detection
Graph Algorithms
Dijkstra’s Algorithm ⭐⭐⭐⭐⭐
What it is: Finds shortest path between nodes in weighted graph. Real-world uses:- Google Maps: Route navigation
- Network routing: BGP, OSPF protocols
- Social networks: Connection suggestions
- Cannot handle negative edge weights
- For negative weights, use Bellman-Ford
A* (A-Star) Algorithm ⭐⭐⭐⭐
What it is: Optimized pathfinding using heuristics. Real-world uses:- Google Maps: Faster routing than Dijkstra’s
- Game development: Character movement
- Robotics: Navigation planning
- Uses heuristic to guide search
- Explores fewer nodes
- Faster in practice for specific goal
PageRank ⭐⭐⭐
What it is: Algorithm to rank nodes in a graph based on connections. Real-world uses:- Google Search: Rank web pages
- Social networks: Identify influencers
- Recommendation systems: Find important items
- Number of incoming links
- Importance of linking pages
Distributed Systems Algorithms
Consistent Hashing ⭐⭐⭐⭐⭐
What it is: Distribute data across nodes with minimal reorganization on changes. Real-world uses:- Load balancers: Distribute requests to servers
- Distributed caches: Partition cache across nodes
- Distributed databases: Data sharding
- Hash servers and keys to points on a circle (ring)
- Key belongs to first server clockwise from it
- Adding/removing server only affects immediate neighbors
- Minimal data movement when scaling
- Handles node failures gracefully
- Even load distribution with virtual nodes
Raft/Paxos Consensus ⭐⭐⭐
What they are: Algorithms for achieving consensus in distributed systems. Real-world uses:- etcd: Uses Raft for distributed configuration
- Consul: Service discovery with Raft
- Google Chubby: Uses Paxos for lock service
- Ensures all nodes agree on a value
- Tolerates node failures (f < n/2)
- Maintains consistency in distributed systems
Merkle Tree ⭐⭐⭐
What it is: Hash tree for efficient verification and synchronization. Real-world uses:- Git: Verify repository integrity
- Bitcoin: Verify transactions in blocks
- Cassandra/DynamoDB: Anti-entropy for data sync
- IPFS: Content addressing
- Leaf nodes contain data hashes
- Parent nodes contain hash of children
- Root hash represents entire dataset
- Compare roots to detect differences
- Traverse tree to find exact differences
- Efficient inconsistency detection
- Logarithmic verification time
- Minimal data transfer for sync
Ranking & Recommendation Algorithms
Collaborative Filtering ⭐⭐⭐⭐
What it is: Predict preferences based on similar users or items. Real-world uses:- Netflix: Movie recommendations
- Amazon: Product suggestions
- Spotify: Music recommendations
- YouTube: Video suggestions
- User-based: Find similar users, recommend what they liked
- Item-based: Find similar items to what user liked
- Cold start problem (new users/items)
- Scalability with millions of users
- Sparsity of rating matrix
TF-IDF ⭐⭐⭐
What it is: Statistical measure for document relevance in search. Real-world uses:- Search engines: Rank documents
- Document similarity: Find related articles
- Keyword extraction: Identify important terms
- TF (Term Frequency): How often term appears in document
- IDF (Inverse Document Frequency): How rare term is across all documents
- TF-IDF = TF × IDF (important if frequent in doc but rare overall)
Caching Algorithms
LRU (Least Recently Used) ⭐⭐⭐⭐⭐
What it is: Eviction policy that removes least recently accessed items. Real-world uses:- Redis: Cache eviction policy
- Browser cache: Memory management
- CPU cache: L1/L2/L3 cache management
- Database buffer pool: Page replacement
- Hash map for O(1) lookup
- Doubly linked list for O(1) eviction
- Move accessed item to front
- Remove from tail when full
LFU (Least Frequently Used) ⭐⭐⭐
What it is: Eviction policy based on access frequency. Real-world uses:- CDN: Keep popular content
- Content delivery: Prioritize viral content
- LRU: Good for temporal locality (recently used likely to be used again)
- LFU: Good for frequency-based popularity (frequently used likely to be used again)
Other Eviction Policies ⭐⭐
- FIFO: First In First Out - Simple but not optimal
- Random: Random eviction - Surprisingly effective sometimes
- LRU-K: Track last K accesses - Better than simple LRU
- 2Q: Two queues for hot and cold data
- ARC: Adaptive replacement cache - Self-tuning
Rate Limiting Algorithms
Token Bucket ⭐⭐⭐⭐⭐
What it is: Allow requests based on available tokens. Real-world uses:- API rate limiting: Stripe, Twitter, GitHub APIs
- Traffic shaping: Network bandwidth control
- DDoS protection: Limit request rates
- Bucket holds N tokens (capacity)
- Tokens added at rate R per second
- Each request consumes one token
- Request rejected if bucket empty
- Allows bursts up to bucket capacity
- Handles burst traffic
- Memory efficient
- Easy to implement
Leaky Bucket ⭐⭐⭐
What it is: Process requests at constant rate. Difference from Token Bucket:- Token bucket allows bursts
- Leaky bucket enforces smooth rate
- Network traffic shaping: Smooth packet flow
- Video streaming: Constant bitrate
Fixed Window Counter ⭐⭐⭐
What it is: Count requests per fixed time window. How it works:- Reset counter every window (e.g., per minute)
- Increment counter for each request
- Reject if counter exceeds limit
Sliding Window Log/Counter ⭐⭐⭐
What it is: Solve fixed window problem with sliding window. Sliding Window Log:- Store timestamp of each request
- Remove old timestamps outside window
- Count remaining timestamps
- Combine current and previous window
- Weighted average based on overlap
String Matching Algorithms
KMP (Knuth-Morris-Pratt) ⭐⭐
What it is: Efficient string pattern matching. Real-world uses:- Text editors: Find functionality
- Log analysis: Pattern detection
- Bioinformatics: DNA sequence matching
- Search in large log files
- Pattern detection in streams
- Plagiarism detection
Rabin-Karp ⭐⭐
What it is: Pattern matching using rolling hash. Real-world uses:- Plagiarism detection: Multiple pattern search
- Document comparison: Find duplicates
Compression Algorithms
Huffman Coding ⭐⭐⭐
What it is: Lossless data compression using variable-length codes. Real-world uses:- ZIP/GZIP: File compression
- JPEG: Part of compression pipeline
- Network protocols: Reduce bandwidth
LZ77/LZ78 ⭐⭐
What they are: Dictionary-based compression algorithms. Real-world uses:- GZIP: Based on LZ77
- PNG: Uses DEFLATE (LZ77 + Huffman)
Machine Learning Algorithms (High Level)
Recommendation Systems ⭐⭐⭐⭐
Algorithms:- Collaborative filtering
- Content-based filtering
- Matrix factorization
- Deep learning (embeddings)
- Netflix, YouTube, Amazon recommendations
- Spotify playlists
- Twitter/Facebook feeds
Classification/Ranking ⭐⭐⭐
Algorithms:- Logistic regression
- Decision trees/Random forests
- Gradient boosting (XGBoost)
- Neural networks
- Spam detection
- Fraud detection
- Search result ranking
- Content moderation
For system design interviews, you don’t need to implement ML algorithms. Focus on:
- When to use ML vs. rule-based systems
- Training pipeline architecture
- Online vs. offline learning
- Model serving at scale
- A/B testing for model evaluation
How to Study Algorithms for System Design
Understand the Use Case
Don’t memorize implementations. Instead, learn:
- What problem does it solve?
- When should you use it?
- What are the tradeoffs?
Know Real-World Examples
For each algorithm, remember 2-3 real systems that use it:
- “Google Maps uses Dijkstra’s for routing”
- “Redis uses LRU for cache eviction”
- “Cassandra uses consistent hashing for data distribution”
Understand Tradeoffs
Be able to compare alternatives:
- LRU vs LFU for caching
- B-trees vs LSM trees for databases
- Token bucket vs leaky bucket for rate limiting
Quick Reference: Algorithms by System Type
Search/Discovery Systems
- Trie (autocomplete)
- TF-IDF (ranking)
- PageRank (web search)
- Inverted index
Location-Based Systems
- Geohash
- Quadtree
- R-tree
- Dijkstra’s/A*
Social Networks
- Graph algorithms (BFS for friend suggestions)
- Collaborative filtering (recommendations)
- PageRank (influence)
Distributed Systems
- Consistent hashing (load distribution)
- Raft/Paxos (consensus)
- Merkle trees (sync verification)
High-Traffic Systems
- Token bucket (rate limiting)
- LRU cache
- Bloom filters (reduce lookups)
Storage Systems
- B+ trees (relational databases)
- LSM trees (write-heavy databases)
- Consistent hashing (distributed storage)
Common Interview Questions
“How would you implement autocomplete?” → Trie for prefix matching, cache popular queries “How do you find nearby drivers in Uber?” → Geohash for spatial indexing, quadtree for dynamic updates “How does consistent hashing work?” → Explain ring, virtual nodes, minimal data movement on scaling “What caching strategy would you use?” → Depends on access pattern - LRU for general, LFU for hot content “How do you prevent API abuse?” → Token bucket for rate limiting, allows bursts while enforcing limitsNext Steps
You now understand the essential algorithms for system design. Practice applying them:- Common System Design Questions - See algorithms in action
- How to Ace Interviews - Put it all together