What is System Design?
System design defines the architecture, components, data flows, and interfaces of a system to satisfy requirements at scale — balancing performance, reliability, and cost trade-offs.System design bridges requirements and implementation at the architectural level. It demands reasoning simultaneously about performance (latency, throughput), scalability (handling growth), reliability (fault tolerance), and cost. There is rarely one correct answer — every design involves explicit trade-offs with stated justifications.
Interview Framework
When approaching system design problems, follow this structured process:Key Principles
- Clarify requirements first: functional (features) and non-functional (SLAs, scale)
- Estimate scale: users, RPS, data volume, storage, bandwidth — numbers reveal bottlenecks
- Back-of-envelope: 100M DAU × 10 req/day ≈ 12k RPS peak (with 3x buffer)
- Identify the primary bottleneck: CPU, memory, disk I/O, network, or DB connections
- Iterate architecture: start simple (monolith + single DB), add complexity only where proven necessary
- CAP theorem, consistency models, and availability patterns constrain every design decision
Performance vs Scalability
Performance is how fast a single request is served. Scalability is maintaining performance as concurrent load increases. The two require different strategies and diagnostic tools.Understanding the Difference
You cannot always scale away a performance problem, and fast single-request performance does not guarantee scale.- Performance metric: P50/P99 latency for a single request
- Scalability metric: RPS at which latency SLA breaks
- Horizontal scaling (add nodes) works for stateless services — state is the barrier
- Vertical scaling (bigger machine) has a hardware ceiling and a cost cliff
- Connection pool exhaustion is the most common scalability wall in web backends
Diagnostic Example
Latency vs Throughput
Latency is time from request to response. Throughput is the number of requests handled per unit time. Optimizing one often degrades the other; Little’s Law links them.Little’s Law
N = λ × W: Concurrent requests in flight equals arrival rate times response time. When latency spikes, concurrent requests balloon, exhausting thread pools and memory.Key Concepts
- P99 latency matters: the 1% of slow requests drive user churn and support tickets
- Throughput = sustained RPS the system can serve within its latency SLA
- Batching: commit to disk every 100ms (throughput) vs every write (latency)
- Streaming vs batch: real-time pipelines optimize latency; batch jobs optimize throughput
- Async I/O decouples thread count from I/O concurrency — Node.js, Go goroutines, Java virtual threads
CAP Theorem
CAP Theorem states that a distributed system guarantees at most two of: Consistency, Availability, and Partition Tolerance. Since partitions are unavoidable, systems must choose between C and A when a partition occurs.Network partitions are inevitable — packets drop, links fail. Every distributed system must choose: refuse to serve stale data (CP) or always serve data that may be stale (AP).
System Classifications
Categories
CP Systems
Return error or block during partition to avoid stale reads.Examples: PostgreSQL, Zookeeper, etcd, HBaseUse for: Bank balance, stock reservation, distributed locks
AP Systems
Return possibly stale data — always available, eventually consistent.Examples: Cassandra, CouchDB, DynamoDBUse for: Shopping cart, view counters, DNS records
PACELC Extension
PACELC extends CAP: even without partition, there is a latency-consistency trade-off on every operation.- Cassandra tunable consistency: QUORUM reads/writes = CP-like; ONE = AP-like per operation
- Google Spanner: achieves external consistency via TrueTime atomic clocks — practical CP at global scale
Consistency Patterns
Consistency patterns define when and how data changes become visible across distributed nodes.Consistency Levels
| Level | Guarantee | Examples |
|---|---|---|
| Weak | No guarantee — best effort | Video streams, VoIP, real-time gaming |
| Eventual | Converges given no new writes | DNS, Cassandra, DynamoDB, S3 |
| Strong (Linearizable) | Reads always reflect latest write | PostgreSQL, Zookeeper, etcd |
| Read-your-own-writes | User sees their own writes | Profile updates, settings |
| Monotonic reads | Once seen, older values never returned | Session consistency |
| Bounded staleness | Eventual with max lag guarantee | Azure Cosmos DB |
DNS Example: Classic Eventual Consistency
Best Practices
Do: Design per entity consistency requirements
Do: Design per entity consistency requirements
- Use strong consistency for money, inventory levels, and auth tokens
- Use eventual consistency with idempotency keys for user-generated content
- Document CAP choices per service in Architecture Decision Records
Don't: Apply one consistency model globally
Don't: Apply one consistency model globally
- Don’t use AP for financial transactions or inventory reservations
- Don’t use CP for social media likes, view counts, or feed caching
- Don’t ignore replication lag monitoring in eventually consistent systems
Next Steps
Scalability
Learn about horizontal scaling, sharding, and load distribution
Databases
Deep dive into SQL vs NoSQL, sharding, and optimization
Caching
Explore caching strategies and multi-layer cache architecture
Load Balancing
Understand traffic distribution and load balancing algorithms