Load Balancer - System Design 101

What is a Load Balancer?

A load balancer is a device or software application that distributes network or application traffic across multiple servers to optimize resource utilization, maximize throughput, minimize response time, and avoid overload of any single server.

Load balancers ensure high availability and reliability by routing traffic only to healthy servers and distributing load efficiently.

What Does a Load Balancer Do?

Distributes Traffic

Evenly spreads incoming requests across multiple servers to prevent any single server from becoming a bottleneck.

Ensures Availability and Reliability

Monitors server health and automatically reroutes traffic away from failed or unhealthy servers, ensuring uninterrupted service.

Improves Performance

Reduces response time by distributing load and preventing server overload, providing faster user experiences.

Scales Applications

Facilitates horizontal scaling by managing traffic across newly added servers without client configuration changes.

Types of Load Balancers

By Deployment Type

Hardware Load Balancers

Physical devices designed specifically for traffic distribution.Pros:

High performance and throughput
Dedicated hardware resources
Vendor support

Cons:

Expensive
Limited scalability
Requires physical space

Software Load Balancers

Applications installed on standard hardware or virtual machines.Examples: NGINX, HAProxy, TraefikPros:

Cost-effective
Flexible configuration
Easy to scale

Cons:

Shares resources with host
May require more maintenance

Cloud-Based Load Balancers

Managed services integrated into cloud infrastructure.Examples: AWS Elastic Load Balancer, Google Cloud Load Balancing, Azure Load BalancerPros:

Fully managed
Auto-scaling
Pay-as-you-go

Cons:

Vendor lock-in
Ongoing costs

By OSI Layer

Network Load Balancer (NLB)

Operates at: Transport Layer
Routes based on:
  - IP address
  - TCP/UDP ports
  
Characteristics:
  - Does not inspect packet content
  - Very fast (low latency)
  - Simple routing decisions
  - Protocol agnostic
  
Use cases:
  - High-performance applications
  - Non-HTTP protocols
  - Gaming servers
  - IoT applications

Global Server Load Balancing (GSLB)

Distributes traffic across multiple geographical locations for:

Disaster recovery
Global redundancy
Latency optimization
Geographic traffic distribution

Top 6 Load Balancing Algorithms

Static Algorithms

Predetermined routing decisions not based on current server state.

Round Robin

Client requests are sent to different service instances in sequential order.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Best for: Stateless services with equal capacity servers

Sticky Round Robin

Improvement of round-robin where subsequent requests from the same client go to the same server.

Alice's requests → Always Server A
Bob's requests   → Always Server B

Best for: Session-based applications

Also called “Session Persistence” or “Session Affinity”

Weighted Round Robin

Admin assigns weights to servers based on capacity. Higher weight servers handle more requests.

Server A (weight: 5) → Gets 50% of traffic
Server B (weight: 3) → Gets 30% of traffic
Server C (weight: 2) → Gets 20% of traffic

Best for: Heterogeneous server capacities

Hash-Based

Applies a hash function on incoming requests’ IP or URL to determine routing.

server_index = hash(client_ip) % number_of_servers
// OR
server_index = hash(url) % number_of_servers

Best for: Cache distribution, consistent routing

Dynamic Algorithms

Routing decisions based on current server state and performance.

Least Connections

New requests sent to the server with the fewest active connections.

Server A: 45 connections
Server B: 32 connections ← Next request goes here
Server C: 51 connections

Best for: Varying request processing times

Least Response Time

New requests sent to the server with the fastest response time.

Server A: avg 120ms
Server B: avg 85ms  ← Next request goes here
Server C: avg 200ms

Best for: Performance-critical applications

Key Use Cases for Load Balancers

1. Traffic Distribution

Load balancers evenly distribute incoming traffic among multiple servers, preventing any single server from becoming overwhelmed. Benefits:

Optimal performance
Better resource utilization
Improved scalability
Consistent response times

2. High Availability

Load balancers enhance system availability by rerouting traffic away from failed or unhealthy servers to healthy ones.

Health Check Process:
  1. LB sends health check every 10s
  2. Server responds with status
  3. If server fails 3 consecutive checks:
     - Mark as unhealthy
     - Stop sending traffic
  4. Continue monitoring
  5. When server recovers:
     - Mark as healthy
     - Resume traffic

Result: Uninterrupted service even when servers fail

3. SSL Termination

Load balancers offload SSL/TLS encryption and decryption from backend servers.

Client (HTTPS) → Load Balancer (SSL Termination) → Backend (HTTP)

Benefits:

Reduced backend server workload
Centralized certificate management
Improved overall performance
Simplified backend configuration

4. Session Persistence

For applications requiring user sessions on specific servers, load balancers ensure subsequent requests go to the same server.

Set-Cookie: SERVER_ID=server-a; Path=/

All requests with this cookie → Server A

5. Scalability

Load balancers facilitate horizontal scaling by managing traffic across all servers.

Initial Setup:
  Servers: [A, B, C]
  Capacity: 3000 req/s
  
Scale Out:
  Add servers: [D, E]
  New capacity: 5000 req/s
  LB automatically includes new servers
  
Scale In:
  Remove servers: [D, E]
  Reduced capacity: 3000 req/s
  LB gracefully drains connections

6. Health Monitoring

Load balancers continuously monitor server health and performance.

Health Check Configuration:
  protocol: HTTP
  path: /health
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3
  
Actions:
  - Monitor response codes
  - Track response times
  - Remove failed servers
  - Add recovered servers
  - Alert on failures

Realistic Load Balancer Use Cases

Failure Handling

Automatically redirects traffic away from malfunctioning elements to maintain continuous service.

Scenario: Server Crash
Server B crashes
Health check fails
LB marks Server B as down
Traffic redistributed to Servers A and C
No user impact

Instance Health Checks

Continuously evaluates instance functionality, directing requests only to operational servers.

Health Check Types:
  - HTTP/HTTPS: Check status code 200
  - TCP: Verify port connectivity
  - Custom: Application-specific checks

Platform-Specific Routing

Routes requests from different device types to specialized backends.

User-Agent Based Routing:
  Mobile (iOS/Android) → Mobile Backend
  Desktop Browser → Web Backend
  API Client → API Backend
  Bot/Crawler → Rate-limited Backend

SSL Termination

Handles encryption/decryption of SSL traffic, reducing backend processing burden.

Cross-Zone Load Balancing

Distributes traffic across various geographic or network zones.

Multi-AZ Setup:
  Zone A: [Server A1, Server A2]
  Zone B: [Server B1, Server B2]
  Zone C: [Server C1, Server C2]
  
Benefits:
  - Increased resilience
  - Zone failure tolerance
  - Geographic distribution

User Stickiness

Maintains session integrity by consistently directing specific users to designated servers.

Load Balancer Configuration Example

NGINX Configuration

upstream backend {
    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
    }
}

HAProxy Configuration

frontend http_front
    bind *:80
    default_backend http_back

backend http_back
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    
    server server1 192.168.1.10:8080 check inter 10s
    server server2 192.168.1.11:8080 check inter 10s
    server server3 192.168.1.12:8080 check inter 10s

Load Balancer vs API Gateway

Feature	Load Balancer	API Gateway
Primary Function	Traffic distribution	API management
OSI Layer	Layer 4 or 7	Layer 7
Routing	Simple (IP, port, path)	Complex (headers, auth, transforms)
Authentication	No	Yes
Rate Limiting	Basic	Advanced
Protocol Translation	No	Yes
API Composition	No	Yes
Best For	Server distribution	API orchestration

Load balancers and API gateways are often used together: LB distributes traffic, API Gateway manages API-specific concerns.

Best Practices

Implement Health Checks

Configure appropriate health checks for your application.

health_check:
  type: http
  path: /health
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3

Use Multiple Availability Zones

Distribute servers across multiple zones for resilience.

Monitor Key Metrics

Track:

Request rate
Active connections
Backend response times
Error rates
Health check status

Configure Appropriate Timeouts

Set connection and request timeouts to prevent hanging requests.

timeouts:
  connect: 5s
  client: 50s
  server: 50s

Enable SSL/TLS

Terminate SSL at the load balancer for better performance.

Implement Connection Draining

Gracefully handle server removal by draining existing connections.

Use Sticky Sessions Wisely

Only when necessary - they can complicate scaling and failover.

Common Pitfalls to Avoid

Single Point of Failure

Deploy load balancers in high-availability pairs to avoid the load balancer itself becoming a single point of failure.

HA Setup:
  Primary LB:   Active
  Secondary LB: Standby (with failover)
  Virtual IP:   Shared between both

Inadequate Capacity Planning

Monitor load balancer capacity
Plan for peak traffic
Consider auto-scaling

Poor Health Check Configuration

Bad:
  interval: 60s  # Too slow
  timeout: 30s   # Too long
  
Good:
  interval: 10s  # Quick detection
  timeout: 5s    # Reasonable

Ignoring Session Persistence Requirements

Understand application session needs
Choose appropriate persistence mechanism
Plan for session replication or external session storage

Advanced Features

Connection Pooling

Reuse connections to backend servers for better performance.

Request Buffering

Buffer client requests before forwarding to reduce slow client impact.

Compression

Compress responses to reduce bandwidth usage.

WAF Integration

Integrate Web Application Firewall for security.

Rate Limiting

Limit requests per client to prevent abuse.

rate_limits:
  - path: /api/*
    limit: 1000/minute
    by: client_ip
  - path: /api/heavy
    limit: 10/minute
    by: api_key

Key Takeaways

Load balancers are essential for high-availability, scalable systems. Choose the right type and algorithm based on your specific requirements.

Load balancers distribute traffic across multiple servers for reliability and performance
Layer 4 (NLB) is faster; Layer 7 (ALB) provides richer routing capabilities
Static algorithms (round-robin) work for uniform workloads
Dynamic algorithms (least connections) adapt to varying loads
Health monitoring ensures traffic goes only to healthy servers
SSL termination offloads encryption work from backend servers
Often used in combination with API gateways for complete traffic management

APIs & Web

Databases

Caching & Performance

Security

Cloud & Distributed Systems

DevOps & CI/CD

​What is a Load Balancer?

​What Does a Load Balancer Do?

​Distributes Traffic

​Ensures Availability and Reliability

​Improves Performance

​Scales Applications

​Types of Load Balancers

​By Deployment Type

​By OSI Layer

​Global Server Load Balancing (GSLB)

​Top 6 Load Balancing Algorithms

​Static Algorithms

​Dynamic Algorithms

​Key Use Cases for Load Balancers

​1. Traffic Distribution

​2. High Availability

​3. SSL Termination

​4. Session Persistence

​5. Scalability

​6. Health Monitoring

​Realistic Load Balancer Use Cases

​Failure Handling

​Instance Health Checks

​Platform-Specific Routing

​SSL Termination

​Cross-Zone Load Balancing

​User Stickiness

​Load Balancer Configuration Example

​NGINX Configuration

​HAProxy Configuration

​Load Balancer vs API Gateway

​Best Practices

​Common Pitfalls to Avoid

​Single Point of Failure

​Inadequate Capacity Planning

​Poor Health Check Configuration

​Ignoring Session Persistence Requirements

​Advanced Features

​Connection Pooling

​Request Buffering

​Compression

​WAF Integration

​Rate Limiting

​Key Takeaways

Build docs developers (and LLMs) love

What is a Load Balancer?

What Does a Load Balancer Do?

Distributes Traffic

Ensures Availability and Reliability

Improves Performance

Scales Applications

Types of Load Balancers

By Deployment Type

By OSI Layer

Global Server Load Balancing (GSLB)

Top 6 Load Balancing Algorithms

Static Algorithms

Dynamic Algorithms

Key Use Cases for Load Balancers

1. Traffic Distribution

2. High Availability

3. SSL Termination

4. Session Persistence

5. Scalability

6. Health Monitoring

Realistic Load Balancer Use Cases

Failure Handling

Instance Health Checks

Platform-Specific Routing

SSL Termination

Cross-Zone Load Balancing

User Stickiness

Load Balancer Configuration Example

NGINX Configuration

HAProxy Configuration

Load Balancer vs API Gateway

Best Practices

Common Pitfalls to Avoid

Single Point of Failure

Inadequate Capacity Planning

Poor Health Check Configuration

Ignoring Session Persistence Requirements

Advanced Features

Connection Pooling

Request Buffering

Compression

WAF Integration

Rate Limiting

Key Takeaways