Data Structures - System Design 101

Data structures are fundamental to building efficient systems. This guide covers the key data structures you need to understand for system design interviews.

Common Data Structures in Daily Use

Here are 10 key data structures we use every day:

List

Real-World Use: Keep your Twitter feeds

Sequential access
Dynamic sizing
Insertion/deletion at ends
Used in social media feeds, playlists

Stack

Real-World Use: Support undo/redo of the word editor

LIFO (Last In, First Out)
Push and pop operations
Function call stack
Browser history, text editors

Queue

Real-World Use: Keep printer jobs, or send user actions in-game

FIFO (First In, First Out)
Enqueue and dequeue
Task scheduling
Message queues, BFS algorithms

Hash Table

Real-World Use: Caching systems

Key-value pairs
O(1) average lookup
Hash function
Databases, caches, dictionaries

Array

Real-World Use: Math operations

Fixed size (in most languages)
Contiguous memory
Random access O(1)
Matrix operations, images

Heap

Real-World Use: Task scheduling

Priority queue implementation
Min-heap or max-heap
O(log n) insertion/deletion
Priority scheduling, top K problems

Tree

Real-World Use: Keep the HTML document, or for AI decision

Hierarchical structure
Parent-child relationships
Various traversals
DOM, file systems, decision trees

Suffix Tree

Real-World Use: For searching string in a document

String matching
Pattern search O(m)
Space-intensive
Full-text search, DNA sequencing

Graph

Real-World Use: For tracking friendship, or path finding

Vertices and edges
Directed or undirected
Weighted or unweighted
Social networks, maps, dependencies

R-Tree

Real-World Use: For finding the nearest neighbor

Spatial indexing
Multi-dimensional data
Bounding boxes
Geospatial queries, mapping

Each data structure is optimized for specific operations. Understanding their trade-offs is crucial for system design.

Database Data Structures

The choice of data structure will vary depending on your use case. Data can be indexed in memory or on disk. Data formats vary, such as numbers, strings, geographic coordinates, etc. The system might be write-heavy or read-heavy. All of these factors affect your choice of database index format.

Popular Database Index Structures

Skiplist

A common in-memory index type.Characteristics:

Probabilistic data structure
Multiple levels of linked lists
O(log n) search, insert, delete
Simpler than balanced trees

Used in:

Redis (sorted sets)
In-memory databases
Concurrent data structures

Trade-offs:

✅ Simple implementation
✅ Good concurrency
❌ Extra memory for pointers
❌ Probabilistic guarantees

Hash Index

A very common implementation of the “Map” data structure (or “Collection”).Characteristics:

Direct addressing
O(1) average lookup
Hash function + buckets
Collision handling

Used in:

Key-value stores
Database hash indexes
Cache implementations

Trade-offs:

✅ Fast lookups
✅ Fast inserts
❌ No range queries
❌ No ordered traversal

SSTable

Immutable on-disk “Map” implementation.Characteristics:

Sorted Strings Table
Immutable on disk
Sorted by key
Efficient merging

Used in:

Cassandra
LevelDB
RocksDB

Trade-offs:

✅ Sequential writes
✅ Efficient merging
❌ Write amplification
❌ Read overhead

LSM Tree

Skiplist + SSTable. High write throughput.Characteristics:

Log-Structured Merge Tree
In-memory + on-disk components
Background compaction
Write-optimized

Used in:

Cassandra
LevelDB
RocksDB
HBase

Trade-offs:

✅ High write throughput
✅ Efficient updates
❌ Read amplification
❌ Compaction overhead

B-tree

Disk-based solution. Consistent read/write performance.Characteristics:

Self-balancing tree
Multiple keys per node
Optimized for disk I/O
Range queries supported

Used in:

MySQL (InnoDB)
PostgreSQL
Most relational databases
File systems

Trade-offs:

✅ Balanced read/write
✅ Range queries
✅ Sorted order
❌ Write overhead (rebalancing)

Inverted Index

Used for document indexing.Characteristics:

Term → Document mapping
Full-text search
Posting lists
Token-based

Used in:

Elasticsearch
Apache Lucene
Search engines

Trade-offs:

✅ Fast text search
✅ Ranking support
❌ Storage overhead
❌ Update complexity

Suffix Tree

For string pattern search.Characteristics:

All suffixes indexed
Pattern matching O(m)
Space-intensive
Build time O(n)

Used in:

Bioinformatics
String search algorithms
Text processing

Trade-offs:

✅ Fast pattern matching
✅ Multiple patterns
❌ High memory usage
❌ Complex implementation

R-tree

Multi-dimension search, such as finding the nearest neighbor.Characteristics:

Spatial indexing
Bounding boxes (MBR)
Multi-dimensional data
Range queries

Used in:

PostGIS
Geospatial databases
Map applications

Trade-offs:

✅ Spatial queries
✅ Nearest neighbor
❌ Complex updates
❌ Overlap in nodes

This is not an exhaustive list of all database index types. The choice depends on your specific use case, access patterns, and performance requirements.

B-Tree vs LSM-Tree

Two of the most important data structures in database design:

B-Tree
LSM-Tree
Comparison

Most Widely Used in Relational Databases

B-Tree is the most widely used indexing data structure in almost all relational databases.How it Works:The basic unit of information storage in B-Tree is usually called a “page”. Looking up a key traces down the range of keys until the actual value is found.Characteristics:

Self-balancing tree structure
Multiple keys per node
All leaves at same level
Sorted key order
Range queries efficient

Performance:

Read: O(log n) - Optimized for reads
Write: O(log n)
Range query: Efficient
Point query: Efficient

Best For:

Read-heavy workloads
Range queries
Transactional databases
Traditional RDBMS

Used In:

MySQL InnoDB
PostgreSQL
SQLite
Most relational databases

The Key Difference

B-Tree: Read-Optimized

Faster Reads

Fewer disk seeks
Predictable performance
Better for point queries
Efficient range scans

LSM-Tree: Write-Optimized

Faster Writes

Sequential writes
Less write amplification initially
Better write throughput
Batch operations efficient

Trade-offs

Aspect	B-Tree	LSM-Tree
Read Performance	⭐⭐⭐⭐⭐	⭐⭐⭐
Write Performance	⭐⭐⭐	⭐⭐⭐⭐⭐
Space Efficiency	⭐⭐⭐⭐	⭐⭐⭐
Range Queries	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Point Queries	⭐⭐⭐⭐⭐	⭐⭐⭐
Write Amplification	Higher	Lower (initially)
Read Amplification	Lower	Higher
Compaction Overhead	Lower	Higher

When to Choose

Choose B-Tree when:

Read-heavy workload
Need consistent read latency
Range queries are common
Disk space is limited

Choose LSM-Tree when:

Write-heavy workload
Can tolerate read latency variance
Sequential writes common
Compaction overhead acceptable

Key Algorithms for System Design

What are some of the algorithms you should know before taking system design interviews?

Understanding “how those algorithms are used in real-world systems” is generally more important than the implementation details in a system design interview.

Algorithm Importance Levels

⭐⭐⭐⭐⭐ Very Important
⭐⭐⭐ Important
⭐ Advanced

Try to understand how it works and why.

Consistent Hashing

Use Case: Distributed caching, load balancing

Minimizes reorganization when nodes added/removed
Used in: Cassandra, DynamoDB, CDNs
Key concept: Hash ring

Bloom Filter

Use Case: Membership testing, cache optimization

Space-efficient probabilistic data structure
False positives possible, no false negatives
Used in: BigTable, Cassandra, Bitcoin

Merkle Tree

Use Case: Data synchronization, verification

Efficient comparison of datasets
Detect inconsistencies
Used in: Git, Bitcoin, Cassandra

Geohash

Use Case: Location-based services

Encode lat/long to string
Proximity search
Used in: Uber, Google Maps

Data Structure Selection Guide

Quick Reference

Need Fast Lookups?

Hash Table / Hash Map

O(1) average case
Key-value pairs
No ordering

When to use:

Caching
Deduplication
Fast membership testing
Counting occurrences

Need Ordering?

Tree-based structures

B-Tree: Disk-based, balanced
BST: In-memory, may be unbalanced
Red-Black Tree: Self-balancing

When to use:

Range queries
Sorted iteration
Database indexes
Priority queues (Heap)

Need Fast Writes?

LSM-Tree or Log-based

Append-only writes
Background compaction
Write-optimized

When to use:

Time-series data
Logging systems
Write-heavy workloads
Event sourcing

Need Spatial Queries?

R-Tree or QuadTree

Multi-dimensional indexing
Bounding boxes
Nearest neighbor

When to use:

Geospatial applications
Location-based services
Map applications
Collision detection

Need Text Search?

Inverted Index or Trie

Term-based indexing
Full-text search
Prefix matching

When to use:

Search engines
Autocomplete
Document search
Log analysis

Need Graph Relationships?

Graph structures

Adjacency list
Adjacency matrix
Graph databases

When to use:

Social networks
Recommendation engines
Path finding
Dependency resolution

Time Complexity Cheat Sheet

Common Data Structure Operations

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Linked List	O(n)	O(n)	O(1)	O(1)	O(n)
Hash Table	-	O(1)*	O(1)*	O(1)*	O(n)
Binary Search Tree	O(log n)*	O(log n)*	O(log n)*	O(log n)*	O(n)
B-Tree	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
Skip List	O(log n)	O(log n)	O(log n)	O(log n)	O(n log n)
Heap	-	O(n)	O(log n)	O(log n)	O(n)

Average case. Worst case may differ.

Interview Tips

Common Mistakes

Choosing hash table when ordering is needed
Using array when frequent insertions/deletions required
Not considering space complexity
Ignoring the read/write ratio of the workload
Forgetting about the cost of data structure maintenance (rebalancing, compaction)

Interview StrategyWhen discussing data structures in system design:

Understand the access patterns: Read-heavy? Write-heavy? Mixed?
Consider the data characteristics: Size? Growth rate? Data types?
Think about trade-offs: Speed vs space? Consistency vs performance?
Discuss real-world examples: How do major companies solve similar problems?
Consider maintenance: What’s the operational overhead?

Key Questions to Ask

Access Patterns

What’s the read/write ratio?
Are range queries needed?
Is ordering important?
What’s the query latency requirement?

Scale Considerations

What’s the data volume?
How fast is it growing?
How many concurrent users?
What’s the throughput requirement?

Performance Requirements

What’s acceptable latency?
Is consistency critical?
Can we tolerate eventual consistency?
What about durability?

Operational Concerns

What’s the maintenance overhead?
How do we handle compaction?
What about memory usage?
Is the structure disk-friendly?

Practice Problems

Design a Cache System

Consider:

Hash table for O(1) lookups
Doubly linked list for LRU eviction
Combined: LRU Cache

Trade-offs:

Memory overhead for linked list
Fast reads and updates
Eviction policy implementation

Design a Real-time Leaderboard

Consider:

Sorted Set (Skip List) for ranking
Hash table for user scores
Redis sorted sets

Trade-offs:

Memory for maintaining order
Fast score updates
Efficient range queries for top N

Design a Geospatial Index

Consider:

QuadTree for 2D partitioning
Geohash for encoding
R-Tree for bounding boxes

Trade-offs:

Balance between precision and performance
Update complexity
Query efficiency

Design a Search Engine

Consider:

Inverted index for terms
Trie for autocomplete
Ranking algorithms

Trade-offs:

Storage overhead
Index update latency
Query performance

Summary

Know the Basics

Master fundamental data structures: arrays, lists, hash tables, trees, graphs.

Understand Trade-offs

Every data structure has pros and cons. Know when to use which.

Think Real-World

Connect data structures to actual systems: how does Redis use skip lists?

Consider Scale

What works for 1000 items may not work for 1 billion.

Analyze Complexity

Always discuss time and space complexity of your choices.

Practice

Work through real system design problems using different data structures.

System Design Interviews

Fundamentals

​Common Data Structures in Daily Use

List

Stack

Queue

Hash Table

Array

Heap

Tree

Suffix Tree

Graph

R-Tree

​Database Data Structures

​Popular Database Index Structures

​B-Tree vs LSM-Tree

​Most Widely Used in Relational Databases

​Popular in NoSQL Databases

​The Key Difference

B-Tree: Read-Optimized

LSM-Tree: Write-Optimized

​Trade-offs

​When to Choose

​Key Algorithms for System Design

​Algorithm Importance Levels

Consistent Hashing

Bloom Filter

Merkle Tree

Geohash

QuadTree

Leaky Bucket

Token Bucket

Trie

HyperLogLog

Count-Min Sketch

Operational Transform

Raft/Paxos

​Data Structure Selection Guide

​Quick Reference

​Time Complexity Cheat Sheet

​Common Data Structure Operations

​Interview Tips

​Key Questions to Ask

Access Patterns

Scale Considerations

Performance Requirements

Operational Concerns

​Practice Problems

​Summary

Know the Basics

Understand Trade-offs

Think Real-World

Consider Scale

Analyze Complexity

Practice

Build docs developers (and LLMs) love

Common Data Structures in Daily Use

Database Data Structures

Popular Database Index Structures

B-Tree vs LSM-Tree

Most Widely Used in Relational Databases

Popular in NoSQL Databases

The Key Difference

Trade-offs

When to Choose

Key Algorithms for System Design

Algorithm Importance Levels

Data Structure Selection Guide

Quick Reference

Time Complexity Cheat Sheet

Common Data Structure Operations

Interview Tips

Key Questions to Ask

Practice Problems

Summary