Design a System that Scales to Millions on AWS

Overview

Design a system that evolves from serving a single user to millions of users on AWS. This problem demonstrates the iterative approach to scaling, starting with a simple single-server setup and progressively adding components as bottlenecks emerge.

While this uses AWS-specific services, the principles apply generally to any cloud provider or on-premise infrastructure.

Iterative Scaling Approach

Scaling requires a methodical approach:

Benchmark/Load Test

Measure current system performance under load

Profile for Bottlenecks

Identify specific components causing performance issues

Address Bottlenecks

Evaluate alternatives and trade-offs, implement solutions

Repeat

Continuously iterate as system grows

Important: This iterative pattern is good practice for evolving basic designs to scalable designs. Never jump directly to the final architecture!

Step 1: Use Cases and Constraints

Use Cases

In Scope

User makes a read or write request
- Service processes request, stores user data, returns results
Service needs to evolve from small user base to millions of users
- Discuss general scaling patterns for large number of users and requests
Service has high availability

Constraints and Assumptions

Assumptions:

Traffic is not evenly distributed
Need for relational data
Scale from 1 user to tens of millions of users
- Denote increase as: Users+, Users++, Users+++, etc.
10 million users
1 billion writes per month
100 billion reads per month
100:1 read to write ratio
1 KB content per write

Usage Calculations

Back-of-the-envelope calculations

Storage:

1 TB of new content per month
- 1 KB per write × 1 billion writes per month
36 TB of new content in 3 years

Throughput:

400 writes per second on average
40,000 reads per second on average

Conversion guide:

2.5 million seconds per month
1 request per second = 2.5 million requests per month
40 requests per second = 100 million requests per month
400 requests per second = 1 billion requests per month

Step 2: Initial Design - Single Server

Goals (1-2 Users)

Start simple with single box
Vertical scaling when needed
Monitor to determine bottlenecks

Components

Web Server

EC2 instance handling all requests

MySQL Database

Stored on same EC2 instance

Elastic IP

Public static IP that doesn’t change on reboot

DNS

Route 53 maps domain to instance public IP

Vertical Scaling

Advantages

Simple to implement
Choose bigger instance size as needed

Monitoring

Use CloudWatch, top, nagios, statsd, graphite to monitor:

CPU usage
Memory usage
I/O
Network

Disadvantages

Can get very expensive
No redundancy/failover
Limited by single machine capacity

Security

Open Ports
Restrictions

80 for HTTP
443 for HTTPS
22 for SSH (whitelisted IPs only)

SQL vs NoSQL

Start with MySQL Database given relational data requirement.

Discuss SQL vs NoSQL tradeoffs based on specific use case.

Step 3: Users+ - Separate Components

Bottleneck Analysis

Problem: Single box becoming overwhelmed

MySQL taking more memory and CPU
User content filling disk space
Vertical scaling expensive
Cannot scale MySQL and Web Server independently

Goals

Lighten load on single box

Separate concerns for independent scaling

Store static content separately

Move to Object Store (S3)

Move database to separate box

Use RDS for managed MySQL

New Components

Object Store (S3)

Store static content:

User files
JavaScript
CSS
Images
Videos

Benefits:

Highly scalable and reliable
Server-side encryption

RDS MySQL

Managed database service:

Simple to administer and scale
Multiple availability zones
Encryption at rest
Automatic backups

Virtual Private Cloud (VPC)

Network isolation:

Public subnet for Web Server (internet-facing)
Private subnet for database and other internal services
Security groups control access between components

Trade-offs

Increased complexity - Need to update Web Server to point to S3 and RDS
Additional security - Must secure new components
Higher AWS costs - Weigh against managing similar systems yourself

Step 4: Users++ - Horizontal Scaling

Bottleneck Analysis

Problem: Single Web Server bottlenecks during peak hours

Slow responses
Occasional downtime
Need higher availability and redundancy

Goals

Load Balancer

ELB distributes traffic across multiple Web Servers

Highly available
Terminate SSL to reduce backend load
Simplify certificate administration

Multiple Web Servers

Spread across multiple availability zones

Horizontal scaling
Remove single points of failure

MySQL Failover

Master-Slave replication across AZs

Improve redundancy
Enable read scaling

Application Servers

Separate from Web Servers

Independent scaling
Web Servers act as reverse proxy
Separate Read APIs from Write APIs

Content Delivery Network (CDN)

CloudFront
Dynamic Content

Serve static content globally
Reduce latency
Reduce load on origin servers

Step 5: Users+++ - Caching Layer

Note: Internal Load Balancers not shown to reduce clutter

Bottleneck Analysis

Problem: Read-heavy system (100:1 ratio)

Database suffering from high read requests
Poor performance from cache misses

Goals

Add Memory Cache (Elasticache) to reduce load and latency:

Frequently Accessed Content

Cache from MySQL:

Try configuring MySQL Database cache first
If insufficient, implement Memory Cache

Session Data

Store from Web Servers:

Makes Web Servers stateless
Enables Autoscaling

Performance

Reading 1 MB sequentially from memory: ~250 microseconds

4x faster than SSD
80x faster than disk

MySQL Read Replicas

Add Read Replicas

Relieve load on MySQL Write Master

Separate Read/Write Logic

Update Web Server to route appropriately

Add Load Balancers

Distribute reads across replicas

Most services are read-heavy vs write-heavy, making this pattern very effective.

Additional Scaling

Add more Web Servers to improve responsiveness
Add more Application Servers for business logic

Step 6: Users++++ - Autoscaling

Bottleneck Analysis

Problem: Traffic spikes during business hours

Want to automatically scale up/down based on load
Reduce costs by powering down unused instances
Automate DevOps as much as possible

AWS Autoscaling

Configuration

Setup:

Create one group for each Web Server type
Create one group for each Application Server type
Place each group in multiple availability zones
Set min and max number of instances

Scaling Triggers

CloudWatch metrics:

Simple time of day for predictable loads
OR metrics over time period:
- CPU load
- Latency
- Network traffic
- Custom metrics

Disadvantages

Introduces complexity
Takes time to scale up to meet increased demand
Takes time to scale down when demand drops

DevOps Automation

Configuration Management
Monitoring

Chef
Puppet
Ansible

Step 7: Users+++++ - Advanced Scaling

Note: Autoscaling groups not shown to reduce clutter

Continued Scaling Challenges

As service grows towards constraints, continue iterative scaling:

Database Scaling

Challenges:

Database growing too large
40,000 average read requests/second overwhelming replicas
400 average writes/second may overwhelm single Write Master

Solutions:

Data Warehousing

Strategy: Store limited time period in MySQL, rest in RedshiftBenefit: Redshift handles 1 TB/month constraint comfortably

SQL Scaling Patterns

Federation - Split databases by function
Sharding - Distribute data across databases
Denormalization - Optimize read performance
SQL Tuning - Optimize queries and indexes

NoSQL Database

Consider DynamoDB for:

High read/write throughput requirements
Flexible schema needs
Key-value or document data models

Asynchronous Processing

Separate Application Servers for batch processes:

Client uploads data

Example: Photo upload in photo service

Application Server queues job

Places job in Queue (SQS)

Worker Service processes

EC2 or Lambda pulls from Queue:

Performs computation (create thumbnail)
Updates Database
Stores result in Object Store

Memory Cache Scaling

Popular Content
Traffic Spikes
Read Replicas

Scale Memory Cache to handle traffic for popular content

SQL Scaling Patterns

Read replicas
Federation
Sharding
Denormalization
SQL Tuning

NoSQL Options

Key-value store
Document store
Wide column store
Graph database

Caching Strategies

Client caching
CDN caching
Web server caching
Database caching
Application caching

Asynchronous Processing

Message queues
Task queues
Back pressure
Microservices

Key Takeaways

Iterative approach is essential for scaling
- Benchmark → Profile → Address → Repeat
Start simple with single server
Separate concerns for independent scaling
Horizontal scaling with Load Balancer and multiple servers
Caching layer critical for read-heavy workloads
Autoscaling handles traffic variability
Database scaling requires multiple strategies
Asynchronous processing separates real-time from batch work
Monitoring at every stage identifies bottlenecks
Security must evolve with architecture

Scaling is an iterative process. Continue benchmarking and monitoring to address bottlenecks as they arise.

Interview Guide

System Design Problems

Object-Oriented Design

​Overview

​Iterative Scaling Approach

​Step 1: Use Cases and Constraints

​Use Cases

​In Scope

​Constraints and Assumptions

​Usage Calculations

​Step 2: Initial Design - Single Server

​Goals (1-2 Users)

​Components

Web Server

MySQL Database

Elastic IP

DNS

​Vertical Scaling

​Security

​SQL vs NoSQL

​Step 3: Users+ - Separate Components

​Bottleneck Analysis

​Goals

​New Components

​Trade-offs

​Step 4: Users++ - Horizontal Scaling

​Bottleneck Analysis

​Goals

Load Balancer

Multiple Web Servers

MySQL Failover

Application Servers

​Content Delivery Network (CDN)

​Step 5: Users+++ - Caching Layer

​Bottleneck Analysis

​Goals

​MySQL Read Replicas

​Additional Scaling

​Step 6: Users++++ - Autoscaling

​Bottleneck Analysis

​AWS Autoscaling

​DevOps Automation

​Step 7: Users+++++ - Advanced Scaling

​Continued Scaling Challenges

​Database Scaling

​Asynchronous Processing

​Memory Cache Scaling

​Related Topics

SQL Scaling Patterns

NoSQL Options

Caching Strategies

Asynchronous Processing

​Key Takeaways

Build docs developers (and LLMs) love

Overview

Iterative Scaling Approach

Step 1: Use Cases and Constraints

Use Cases

In Scope

Constraints and Assumptions

Usage Calculations

Step 2: Initial Design - Single Server

Goals (1-2 Users)

Components

Vertical Scaling

Security

SQL vs NoSQL

Step 3: Users+ - Separate Components

Bottleneck Analysis

Goals

New Components

Trade-offs

Step 4: Users++ - Horizontal Scaling

Bottleneck Analysis

Goals

Content Delivery Network (CDN)

Step 5: Users+++ - Caching Layer

Bottleneck Analysis

Goals

MySQL Read Replicas

Additional Scaling

Step 6: Users++++ - Autoscaling

Bottleneck Analysis

AWS Autoscaling

DevOps Automation

Step 7: Users+++++ - Advanced Scaling

Continued Scaling Challenges

Database Scaling

Asynchronous Processing

Memory Cache Scaling

Related Topics

Key Takeaways