Infrastructure & Cloud
Load Balancing & Traffic Distribution
Load Balancing & Traffic Distribution
Problem: Two Sum, Subarray Sum Equals K
Real Systems: AWS ELB, Google Cloud Load Balancer, NginxHow it’s used:- Hash maps track server capacity and request counts in O(1) time
- AWS ELB uses consistent hashing (hash table) to route requests to same backend servers
- Netflix’s Ribbon load balancer maintains hash maps of available instances
- Prefix sum techniques calculate rolling window metrics for health checks
- AWS ELB processes millions of requests/second using hash-based routing
- Google’s Maglev load balancer uses consistent hashing to distribute 1M+ QPS
- Hash maps: O(1) server lookup
- Consistent hashing: minimize reassignments during scaling
- Prefix sums: rolling window statistics
Resource Scheduling & Allocation
Resource Scheduling & Allocation
Problem: Merge Intervals, Meeting Rooms
Real Systems: Kubernetes, AWS EC2 Auto Scaling, Google BorgHow it’s used:- Kubernetes scheduler merges overlapping time windows to find optimal pod placement
- AWS Auto Scaling groups use interval merging to prevent conflicting scale operations
- Google’s Borg scheduler allocates CPU/memory using interval intersection algorithms
- Docker Swarm uses interval trees for resource reservation
- Google Borg schedules billions of containers per week
- AWS EC2 manages millions of scaling events daily
Cache Invalidation & Eviction
Cache Invalidation & Eviction
Problem: LRU Cache, LFU Cache
Real Systems: Redis, Memcached, Varnish, CDN edge serversHow it’s used:- Redis implements LRU eviction using hash map + doubly linked list
- Cloudflare CDN uses LRU cache for edge servers (200+ cities)
- Nginx proxy cache uses LRU for hot content
- Chrome browser cache implements LRU for visited pages
- Hash Map: O(1) cache key lookup
- Doubly Linked List: O(1) move to front on access
- Combined: O(1) get/put operations at scale
- Redis serves 100K+ ops/second with LRU eviction
- Cloudflare CDN caches petabytes using LRU variants
Databases & Storage
Query Optimization & Indexing
Query Optimization & Indexing
Problem: Binary Search, Range Sum Query
Real Systems: PostgreSQL, MySQL, MongoDB, ElasticsearchHow it’s used:- B-Tree indexes: PostgreSQL uses binary search on sorted keys (O(log n) lookup)
- Range queries: Prefix sum techniques for aggregations (COUNT, SUM, AVG)
- MongoDB: Binary search in sorted index for queries
- Elasticsearch: Binary search in term dictionaries for full-text search
- PostgreSQL B-tree indexes reduce query time from O(n) to O(log n)
- Elasticsearch serves billions of queries/day using binary search
Database Transactions & Concurrency
Database Transactions & Concurrency
Problem: Topological Sort, Graph Cycle Detection
Real Systems: PostgreSQL, MySQL InnoDB, Oracle DatabaseHow it’s used:- Deadlock detection: Cycle detection in wait-for graphs
- Transaction ordering: Topological sort for serializable isolation
- PostgreSQL uses DFS cycle detection to identify deadlocked transactions
- MySQL InnoDB implements wait-for graph traversal for deadlock resolution
- PostgreSQL checks for deadlocks every second in high-concurrency workloads
- MySQL InnoDB resolves 100K+ potential deadlocks/day in busy systems
Log Compaction & Merge
Log Compaction & Merge
Problem: Merge K Sorted Lists
Real Systems: Apache Kafka, Apache Cassandra, LevelDB, RocksDBHow it’s used:- Kafka log compaction: Merge multiple sorted segments by key
- LSM trees: LevelDB/RocksDB merge sorted runs during compaction
- Cassandra: Merge SSTables (sorted string tables) in background
- ClickHouse: Merge sorted parts for efficient analytics
- K-way merge using min heap: O(N log K) time
- Each segment is sorted; heap tracks minimum across all segments
- Kafka compacts trillions of messages
- RocksDB at Facebook merges petabytes daily
Search & Information Retrieval
Autocomplete & Type-ahead Search
Autocomplete & Type-ahead Search
Problem: Trie (Prefix Tree), Top K Frequent Elements
Real Systems: Google Search, Amazon product search, IDE autocompleteHow it’s used:- Trie data structure: Store all possible completions
- Google Search: Prefix tree + frequency ranking for suggestions
- Amazon: Product names stored in trie, ranked by popularity
- VS Code: Trie for symbol completion (variables, functions)
- Trie nodes store top-K at each prefix (precomputed)
- Google serves suggestions in <10ms using distributed tries
- Google processes 8.5 billion searches/day
- Each keystroke = 1 trie lookup = O(m) where m = prefix length
Document Ranking & Relevance
Document Ranking & Relevance
Problem: Top K Elements, Priority Queue
Real Systems: Elasticsearch, Apache Solr, Google SearchHow it’s used:- TF-IDF scoring: Min heap tracks top K documents by relevance score
- Elasticsearch: Uses priority queue for result aggregation
- Google PageRank: Heap-based sorting for top results
- Amazon search: Product ranking by relevance + popularity
- Don’t need to fully sort 1M results
- Only need top 100 → min heap of size 100
- O(n log 100) vs O(n log n) for full sort
Fuzzy Matching & Spell Check
Fuzzy Matching & Spell Check
Problem: Edit Distance (Levenshtein), Longest Common Subsequence
Real Systems: Google “Did you mean?”, Grammarly, IDEsHow it’s used:- Google Search: Edit distance ≤ 2 for spell corrections
- Grammarly: DP algorithm for word suggestions
- Git: Levenshtein distance for “did you mean this branch?”
- IDEs: Fuzzy symbol search using edit distance
- Pre-filter candidates by length difference
- Use BK-tree for fast edit-distance queries
- Google checks millions of dictionary words in milliseconds
Networking & Communication
IP Routing & Packet Forwarding
IP Routing & Packet Forwarding
Problem: Trie (for IP prefixes), Binary Search
Real Systems: Cisco routers, Linux kernel routing table, BGPHow it’s used:- IP routing: Trie (radix tree) for longest prefix matching
- Linux kernel: Trie-based routing table (150K+ routes)
- BGP routers: Prefix tree for path selection
- SDN controllers: Trie for flow rule lookup
- Internet routers handle 500K+ routes
- Linux forwards millions packets/second using trie lookup
Rate Limiting & Throttling
Rate Limiting & Throttling
Problem: Sliding Window, Queue
Real Systems: API Gateways (Kong, AWS API Gateway), Nginx, CDNsHow it’s used:- Token bucket: Fixed-size queue for request tokens
- Sliding window: Track requests in time windows using queue
- AWS API Gateway: Rate limit per API key using sliding window
- Stripe API: 100 requests/second enforced via sliding window
- Redis sorted sets for distributed rate limiting
- Stripe tracks 50K+ API keys with individual rate limits
DNS Resolution & Caching
DNS Resolution & Caching
Problem: Hash Map, Trie, LRU Cache
Real Systems: BIND, Cloudflare DNS, Google Public DNSHow it’s used:- Domain lookup: Hash map for O(1) domain → IP resolution
- Subdomain matching: Trie for wildcard DNS (*.example.com)
- DNS cache: LRU cache for frequently queried domains
- TTL expiration: Min heap for cache entry expiration
- Hash map: O(1) exact domain match
- Trie: Efficient wildcard/prefix matching
- LRU: Keep hot domains in memory
Media & Content Delivery
Video Streaming & ABR
Video Streaming & ABR
Problem: Binary Search, Sliding Window Maximum
Real Systems: Netflix, YouTube, Twitch adaptive bitrateHow it’s used:- Adaptive Bitrate: Binary search to find optimal quality based on bandwidth
- Buffer management: Sliding window tracks download rate
- YouTube: Binary search through available resolutions (144p-8K)
- Netflix: ABR algorithm adjusts quality every 2-10 seconds
- Netflix serves 250M+ subscribers
- YouTube: 1 billion hours watched daily
- Bitrate changes every few seconds per stream
Image Processing & CDN
Image Processing & CDN
Problem: Matrix Traversal, BFS/DFS
Real Systems: Instagram filters, Cloudinary, ImgixHow it’s used:- Image resizing: Matrix operations for pixel manipulation
- Flood fill: BFS/DFS for background removal, filters
- Instagram filters: Matrix convolution using 2D arrays
- CDN optimization: Graph algorithms for closest edge server
- Thumbnail generation: Matrix resize algorithm
- Smart crop: BFS to find regions of interest
- Cloudinary: Processes billions of images using matrix algorithms
Music Recommendation & Playlists
Music Recommendation & Playlists
Problem: Graph Algorithms, Top K Elements
Real Systems: Spotify, Apple Music, YouTube MusicHow it’s used:- Collaborative filtering: Graph of users → songs
- Similar songs: BFS/DFS on song similarity graph
- Discover Weekly: Graph traversal + heap for top recommendations
- Radio feature: DFS from seed song to build playlist
- Spotify: 500M users, 100M songs
- Graph: Billions of user-song edges
- Daily: Generate millions of personalized playlists
E-commerce & Transactions
Inventory Management
Inventory Management
Problem: Range Updates, Segment Tree
Real Systems: Amazon, Shopify, Walmart e-commerceHow it’s used:- Stock updates: Range update queries for warehouse inventory
- Multi-location inventory: Segment tree for fast aggregation
- Amazon: Track millions of SKUs across hundreds of warehouses
- Real-time availability: O(log n) query for stock levels
Price Optimization & Dynamic Pricing
Price Optimization & Dynamic Pricing
Problem: Sliding Window, DP (Stock prices)
Real Systems: Uber surge pricing, Amazon pricing, airline ticketsHow it’s used:- Surge pricing: Sliding window tracks demand over time
- Best time to buy: DP algorithm finds optimal purchase windows
- Uber: Adjusts prices every few minutes based on supply/demand
- Amazon: Dynamic pricing changes 2.5M times per day
Fraud Detection & Anomaly Detection
Fraud Detection & Anomaly Detection
Problem: Sliding Window, Hash Map
Real Systems: Stripe, PayPal, credit card processorsHow it’s used:- Transaction velocity: Sliding window counts transactions per hour
- Duplicate detection: Hash map tracks recent transactions
- Stripe Radar: ML + sliding window for fraud patterns
- Credit cards: Flag if >5 transactions in 10 minutes (sliding window)
Developer Tools & Infrastructure
Git & Version Control
Git & Version Control
Problem: Graph Algorithms, Topological Sort, LCA
Real Systems: Git, GitHub, GitLab, MercurialHow it’s used:- Commit history: DAG (Directed Acyclic Graph)
- Merge base: Lowest Common Ancestor of two branches
- git log: Topological sort of commit graph
- git rebase: Graph traversal and rewriting
- LCA: Find common ancestor for merge (O(log n) with preprocessing)
- Topological sort: Display commits in dependency order
- Binary search: Find bug-introducing commit (git bisect)
- Linux kernel: 1M+ commits
- Chromium: 1.2M+ commits
- Git handles massive repos efficiently
Code Compilation & Dependency Resolution
Code Compilation & Dependency Resolution
Problem: Topological Sort, DFS
Real Systems: Make, Gradle, npm, MavenHow it’s used:- Build order: Topological sort of dependency graph
- Makefile: Determine compilation order for C/C++ files
- npm install: Resolve package dependencies using DFS
- Maven: Build multi-module projects in correct order
- Circular dependencies → DFS detects cycles
- npm/yarn error: “Circular dependency detected”
IDE Autocomplete & Code Intelligence
IDE Autocomplete & Code Intelligence
Problem: Trie, DFS/BFS, Hash Map
Real Systems: VS Code IntelliSense, JetBrains IDEs, Sublime TextHow it’s used:- Symbol completion: Trie stores all symbols (functions, variables)
- Go to definition: Hash map for O(1) symbol → location lookup
- Find references: DFS through call graph
- Import suggestions: Trie of all available modules
- Large codebases: Millions of symbols
- Trie enables instant completion (<100ms)
- Microsoft uses these techniques in TypeScript compiler
Key Takeaways
Production vs Interview
- Two Sum → Load balancing: Hash maps for O(1) routing
- LRU Cache → Redis, CDNs: Cache eviction at scale
- Merge K Lists → Databases: Log compaction, LSM trees
- Binary Search → Everything: Indexes, routing, ABR
- Graph algorithms → Version control: Git, dependency resolution
- Interview: Clean input, clear constraints
- Production: Distributed systems, failure handling, monitoring
Why These Algorithms?
- Hash Maps (Two Sum): Used billions of times per second in production
- Heaps (Top K): Core of search, recommendation, monitoring
- Binary Search: Foundation of databases, networking, media
- Graph algorithms: Version control, social networks, maps
- DP: Resource optimization, pricing, ML
- System Design Primer
- Engineering blogs: Netflix, Uber, Stripe, Cloudflare
- Open source: Redis, PostgreSQL, Kubernetes (see the algorithms in code)