Skip to main content

Overview

Design a system that evolves from serving a single user to millions of users on AWS. This problem demonstrates the iterative approach to scaling, starting with a simple single-server setup and progressively adding components as bottlenecks emerge.
While this uses AWS-specific services, the principles apply generally to any cloud provider or on-premise infrastructure.

Iterative Scaling Approach

Scaling requires a methodical approach:
1

Benchmark/Load Test

Measure current system performance under load
2

Profile for Bottlenecks

Identify specific components causing performance issues
3

Address Bottlenecks

Evaluate alternatives and trade-offs, implement solutions
4

Repeat

Continuously iterate as system grows
Important: This iterative pattern is good practice for evolving basic designs to scalable designs. Never jump directly to the final architecture!

Step 1: Use Cases and Constraints

Use Cases

In Scope

  • User makes a read or write request
    • Service processes request, stores user data, returns results
  • Service needs to evolve from small user base to millions of users
    • Discuss general scaling patterns for large number of users and requests
  • Service has high availability

Constraints and Assumptions

Assumptions:
  • Traffic is not evenly distributed
  • Need for relational data
  • Scale from 1 user to tens of millions of users
    • Denote increase as: Users+, Users++, Users+++, etc.
  • 10 million users
  • 1 billion writes per month
  • 100 billion reads per month
  • 100:1 read to write ratio
  • 1 KB content per write

Usage Calculations

Storage:
  • 1 TB of new content per month
    • 1 KB per write × 1 billion writes per month
  • 36 TB of new content in 3 years
Throughput:
  • 400 writes per second on average
  • 40,000 reads per second on average
Conversion guide:
  • 2.5 million seconds per month
  • 1 request per second = 2.5 million requests per month
  • 40 requests per second = 100 million requests per month
  • 400 requests per second = 1 billion requests per month

Step 2: Initial Design - Single Server

Single Server Design

Goals (1-2 Users)

  • Start simple with single box
  • Vertical scaling when needed
  • Monitor to determine bottlenecks

Components

Web Server

EC2 instance handling all requests

MySQL Database

Stored on same EC2 instance

Elastic IP

Public static IP that doesn’t change on reboot

DNS

Route 53 maps domain to instance public IP

Vertical Scaling

  • Simple to implement
  • Choose bigger instance size as needed
Use CloudWatch, top, nagios, statsd, graphite to monitor:
  • CPU usage
  • Memory usage
  • I/O
  • Network
  • Can get very expensive
  • No redundancy/failover
  • Limited by single machine capacity

Security

  • 80 for HTTP
  • 443 for HTTPS
  • 22 for SSH (whitelisted IPs only)

SQL vs NoSQL

Start with MySQL Database given relational data requirement.
Discuss SQL vs NoSQL tradeoffs based on specific use case.

Step 3: Users+ - Separate Components

Separate Components

Bottleneck Analysis

Problem: Single box becoming overwhelmed
  • MySQL taking more memory and CPU
  • User content filling disk space
  • Vertical scaling expensive
  • Cannot scale MySQL and Web Server independently

Goals

1

Lighten load on single box

Separate concerns for independent scaling
2

Store static content separately

Move to Object Store (S3)
3

Move database to separate box

Use RDS for managed MySQL

New Components

Store static content:
  • User files
  • JavaScript
  • CSS
  • Images
  • Videos
Benefits:
  • Highly scalable and reliable
  • Server-side encryption
Managed database service:
  • Simple to administer and scale
  • Multiple availability zones
  • Encryption at rest
  • Automatic backups
Network isolation:
  • Public subnet for Web Server (internet-facing)
  • Private subnet for database and other internal services
  • Security groups control access between components

Trade-offs

  • Increased complexity - Need to update Web Server to point to S3 and RDS
  • Additional security - Must secure new components
  • Higher AWS costs - Weigh against managing similar systems yourself

Step 4: Users++ - Horizontal Scaling

Horizontal Scaling

Bottleneck Analysis

Problem: Single Web Server bottlenecks during peak hours
  • Slow responses
  • Occasional downtime
  • Need higher availability and redundancy

Goals

Load Balancer

ELB distributes traffic across multiple Web Servers
  • Highly available
  • Terminate SSL to reduce backend load
  • Simplify certificate administration

Multiple Web Servers

Spread across multiple availability zones
  • Horizontal scaling
  • Remove single points of failure

MySQL Failover

Master-Slave replication across AZs
  • Improve redundancy
  • Enable read scaling

Application Servers

Separate from Web Servers
  • Independent scaling
  • Web Servers act as reverse proxy
  • Separate Read APIs from Write APIs

Content Delivery Network (CDN)

  • Serve static content globally
  • Reduce latency
  • Reduce load on origin servers

Step 5: Users+++ - Caching Layer

Caching Layer
Note: Internal Load Balancers not shown to reduce clutter

Bottleneck Analysis

Problem: Read-heavy system (100:1 ratio)
  • Database suffering from high read requests
  • Poor performance from cache misses

Goals

Add Memory Cache (Elasticache) to reduce load and latency:
Cache from MySQL:
  • Try configuring MySQL Database cache first
  • If insufficient, implement Memory Cache
Store from Web Servers:
  • Makes Web Servers stateless
  • Enables Autoscaling
Reading 1 MB sequentially from memory: ~250 microseconds
  • 4x faster than SSD
  • 80x faster than disk

MySQL Read Replicas

1

Add Read Replicas

Relieve load on MySQL Write Master
2

Separate Read/Write Logic

Update Web Server to route appropriately
3

Add Load Balancers

Distribute reads across replicas
Most services are read-heavy vs write-heavy, making this pattern very effective.

Additional Scaling

  • Add more Web Servers to improve responsiveness
  • Add more Application Servers for business logic

Step 6: Users++++ - Autoscaling

Autoscaling

Bottleneck Analysis

Problem: Traffic spikes during business hours
  • Want to automatically scale up/down based on load
  • Reduce costs by powering down unused instances
  • Automate DevOps as much as possible

AWS Autoscaling

Setup:
  • Create one group for each Web Server type
  • Create one group for each Application Server type
  • Place each group in multiple availability zones
  • Set min and max number of instances
CloudWatch metrics:
  • Simple time of day for predictable loads
  • OR metrics over time period:
    • CPU load
    • Latency
    • Network traffic
    • Custom metrics
  • Introduces complexity
  • Takes time to scale up to meet increased demand
  • Takes time to scale down when demand drops

DevOps Automation

  • Chef
  • Puppet
  • Ansible

Step 7: Users+++++ - Advanced Scaling

Advanced Scaling
Note: Autoscaling groups not shown to reduce clutter

Continued Scaling Challenges

As service grows towards constraints, continue iterative scaling:

Database Scaling

Challenges:
  • Database growing too large
  • 40,000 average read requests/second overwhelming replicas
  • 400 average writes/second may overwhelm single Write Master
Solutions:
Strategy: Store limited time period in MySQL, rest in RedshiftBenefit: Redshift handles 1 TB/month constraint comfortably
  • Federation - Split databases by function
  • Sharding - Distribute data across databases
  • Denormalization - Optimize read performance
  • SQL Tuning - Optimize queries and indexes
Consider DynamoDB for:
  • High read/write throughput requirements
  • Flexible schema needs
  • Key-value or document data models

Asynchronous Processing

Separate Application Servers for batch processes:
1

Client uploads data

Example: Photo upload in photo service
2

Application Server queues job

Places job in Queue (SQS)
3

Worker Service processes

EC2 or Lambda pulls from Queue:
  • Performs computation (create thumbnail)
  • Updates Database
  • Stores result in Object Store

Memory Cache Scaling

SQL Scaling Patterns

  • Read replicas
  • Federation
  • Sharding
  • Denormalization
  • SQL Tuning

NoSQL Options

  • Key-value store
  • Document store
  • Wide column store
  • Graph database

Caching Strategies

  • Client caching
  • CDN caching
  • Web server caching
  • Database caching
  • Application caching

Asynchronous Processing

  • Message queues
  • Task queues
  • Back pressure
  • Microservices

Key Takeaways

  • Iterative approach is essential for scaling
    • Benchmark → Profile → Address → Repeat
  • Start simple with single server
  • Separate concerns for independent scaling
  • Horizontal scaling with Load Balancer and multiple servers
  • Caching layer critical for read-heavy workloads
  • Autoscaling handles traffic variability
  • Database scaling requires multiple strategies
  • Asynchronous processing separates real-time from batch work
  • Monitoring at every stage identifies bottlenecks
  • Security must evolve with architecture
Scaling is an iterative process. Continue benchmarking and monitoring to address bottlenecks as they arise.

Build docs developers (and LLMs) love