Skip to main content

What is a Load Balancer?

What is a Load Balancer A load balancer is a device or software application that distributes network or application traffic across multiple servers to optimize resource utilization, maximize throughput, minimize response time, and avoid overload of any single server.
Load balancers ensure high availability and reliability by routing traffic only to healthy servers and distributing load efficiently.

What Does a Load Balancer Do?

Distributes Traffic

Evenly spreads incoming requests across multiple servers to prevent any single server from becoming a bottleneck.

Ensures Availability and Reliability

Monitors server health and automatically reroutes traffic away from failed or unhealthy servers, ensuring uninterrupted service.

Improves Performance

Reduces response time by distributing load and preventing server overload, providing faster user experiences.

Scales Applications

Facilitates horizontal scaling by managing traffic across newly added servers without client configuration changes.

Types of Load Balancers

By Deployment Type

1

Hardware Load Balancers

Physical devices designed specifically for traffic distribution.Pros:
  • High performance and throughput
  • Dedicated hardware resources
  • Vendor support
Cons:
  • Expensive
  • Limited scalability
  • Requires physical space
2

Software Load Balancers

Applications installed on standard hardware or virtual machines.Examples: NGINX, HAProxy, TraefikPros:
  • Cost-effective
  • Flexible configuration
  • Easy to scale
Cons:
  • Shares resources with host
  • May require more maintenance
3

Cloud-Based Load Balancers

Managed services integrated into cloud infrastructure.Examples: AWS Elastic Load Balancer, Google Cloud Load Balancing, Azure Load BalancerPros:
  • Fully managed
  • Auto-scaling
  • Pay-as-you-go
Cons:
  • Vendor lock-in
  • Ongoing costs

By OSI Layer

Network Load Balancer (NLB)

Operates at: Transport Layer
Routes based on:
  - IP address
  - TCP/UDP ports
  
Characteristics:
  - Does not inspect packet content
  - Very fast (low latency)
  - Simple routing decisions
  - Protocol agnostic
  
Use cases:
  - High-performance applications
  - Non-HTTP protocols
  - Gaming servers
  - IoT applications

Global Server Load Balancing (GSLB)

Distributes traffic across multiple geographical locations for:
  • Disaster recovery
  • Global redundancy
  • Latency optimization
  • Geographic traffic distribution

Top 6 Load Balancing Algorithms

Load Balancing Algorithms

Static Algorithms

Predetermined routing decisions not based on current server state.
1

Round Robin

Client requests are sent to different service instances in sequential order.
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Best for: Stateless services with equal capacity servers
2

Sticky Round Robin

Improvement of round-robin where subsequent requests from the same client go to the same server.
Alice's requests → Always Server A
Bob's requests   → Always Server B
Best for: Session-based applications
Also called “Session Persistence” or “Session Affinity”
3

Weighted Round Robin

Admin assigns weights to servers based on capacity. Higher weight servers handle more requests.
Server A (weight: 5) → Gets 50% of traffic
Server B (weight: 3) → Gets 30% of traffic
Server C (weight: 2) → Gets 20% of traffic
Best for: Heterogeneous server capacities
4

Hash-Based

Applies a hash function on incoming requests’ IP or URL to determine routing.
server_index = hash(client_ip) % number_of_servers
// OR
server_index = hash(url) % number_of_servers
Best for: Cache distribution, consistent routing

Dynamic Algorithms

Routing decisions based on current server state and performance.
1

Least Connections

New requests sent to the server with the fewest active connections.
Server A: 45 connections
Server B: 32 connections ← Next request goes here
Server C: 51 connections
Best for: Varying request processing times
2

Least Response Time

New requests sent to the server with the fastest response time.
Server A: avg 120ms
Server B: avg 85ms  ← Next request goes here
Server C: avg 200ms
Best for: Performance-critical applications

Key Use Cases for Load Balancers

Load Balancer Use Cases

1. Traffic Distribution

Load balancers evenly distribute incoming traffic among multiple servers, preventing any single server from becoming overwhelmed. Benefits:
  • Optimal performance
  • Better resource utilization
  • Improved scalability
  • Consistent response times

2. High Availability

Load balancers enhance system availability by rerouting traffic away from failed or unhealthy servers to healthy ones.
Health Check Process:
  1. LB sends health check every 10s
  2. Server responds with status
  3. If server fails 3 consecutive checks:
     - Mark as unhealthy
     - Stop sending traffic
  4. Continue monitoring
  5. When server recovers:
     - Mark as healthy
     - Resume traffic
Result: Uninterrupted service even when servers fail

3. SSL Termination

Load balancers offload SSL/TLS encryption and decryption from backend servers.
Client (HTTPS) → Load Balancer (SSL Termination) → Backend (HTTP)
Benefits:
  • Reduced backend server workload
  • Centralized certificate management
  • Improved overall performance
  • Simplified backend configuration

4. Session Persistence

For applications requiring user sessions on specific servers, load balancers ensure subsequent requests go to the same server.
Set-Cookie: SERVER_ID=server-a; Path=/

All requests with this cookie → Server A

5. Scalability

Load balancers facilitate horizontal scaling by managing traffic across all servers.
Initial Setup:
  Servers: [A, B, C]
  Capacity: 3000 req/s
  
Scale Out:
  Add servers: [D, E]
  New capacity: 5000 req/s
  LB automatically includes new servers
  
Scale In:
  Remove servers: [D, E]
  Reduced capacity: 3000 req/s
  LB gracefully drains connections

6. Health Monitoring

Load balancers continuously monitor server health and performance.
Health Check Configuration:
  protocol: HTTP
  path: /health
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3
  
Actions:
  - Monitor response codes
  - Track response times
  - Remove failed servers
  - Add recovered servers
  - Alert on failures

Realistic Load Balancer Use Cases

Failure Handling

Automatically redirects traffic away from malfunctioning elements to maintain continuous service.
Scenario: Server Crash
  1. Server B crashes
  2. Health check fails
  3. LB marks Server B as down
  4. Traffic redistributed to Servers A and C
  5. No user impact

Instance Health Checks

Continuously evaluates instance functionality, directing requests only to operational servers.
Health Check Types:
  - HTTP/HTTPS: Check status code 200
  - TCP: Verify port connectivity
  - Custom: Application-specific checks

Platform-Specific Routing

Routes requests from different device types to specialized backends.
User-Agent Based Routing:
  Mobile (iOS/Android) → Mobile Backend
  Desktop Browser → Web Backend
  API Client → API Backend
  Bot/Crawler → Rate-limited Backend

SSL Termination

Handles encryption/decryption of SSL traffic, reducing backend processing burden.

Cross-Zone Load Balancing

Distributes traffic across various geographic or network zones.
Multi-AZ Setup:
  Zone A: [Server A1, Server A2]
  Zone B: [Server B1, Server B2]
  Zone C: [Server C1, Server C2]
  
Benefits:
  - Increased resilience
  - Zone failure tolerance
  - Geographic distribution

User Stickiness

Maintains session integrity by consistently directing specific users to designated servers.

Load Balancer Configuration Example

NGINX Configuration

upstream backend {
    server backend1.example.com;
    server backend2.example.com;
    server backend3.example.com;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
    }
}

HAProxy Configuration

frontend http_front
    bind *:80
    default_backend http_back

backend http_back
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    
    server server1 192.168.1.10:8080 check inter 10s
    server server2 192.168.1.11:8080 check inter 10s
    server server3 192.168.1.12:8080 check inter 10s

Load Balancer vs API Gateway

Load Balancer vs API Gateway
FeatureLoad BalancerAPI Gateway
Primary FunctionTraffic distributionAPI management
OSI LayerLayer 4 or 7Layer 7
RoutingSimple (IP, port, path)Complex (headers, auth, transforms)
AuthenticationNoYes
Rate LimitingBasicAdvanced
Protocol TranslationNoYes
API CompositionNoYes
Best ForServer distributionAPI orchestration
Load balancers and API gateways are often used together: LB distributes traffic, API Gateway manages API-specific concerns.

Best Practices

1

Implement Health Checks

Configure appropriate health checks for your application.
health_check:
  type: http
  path: /health
  interval: 10s
  timeout: 5s
  healthy_threshold: 2
  unhealthy_threshold: 3
2

Use Multiple Availability Zones

Distribute servers across multiple zones for resilience.
3

Monitor Key Metrics

Track:
  • Request rate
  • Active connections
  • Backend response times
  • Error rates
  • Health check status
4

Configure Appropriate Timeouts

Set connection and request timeouts to prevent hanging requests.
timeouts:
  connect: 5s
  client: 50s
  server: 50s
5

Enable SSL/TLS

Terminate SSL at the load balancer for better performance.
6

Implement Connection Draining

Gracefully handle server removal by draining existing connections.
7

Use Sticky Sessions Wisely

Only when necessary - they can complicate scaling and failover.

Common Pitfalls to Avoid

Single Point of Failure

Deploy load balancers in high-availability pairs to avoid the load balancer itself becoming a single point of failure.
HA Setup:
  Primary LB:   Active
  Secondary LB: Standby (with failover)
  Virtual IP:   Shared between both

Inadequate Capacity Planning

  • Monitor load balancer capacity
  • Plan for peak traffic
  • Consider auto-scaling

Poor Health Check Configuration

Bad:
  interval: 60s  # Too slow
  timeout: 30s   # Too long
  
Good:
  interval: 10s  # Quick detection
  timeout: 5s    # Reasonable

Ignoring Session Persistence Requirements

  • Understand application session needs
  • Choose appropriate persistence mechanism
  • Plan for session replication or external session storage

Advanced Features

Connection Pooling

Reuse connections to backend servers for better performance.

Request Buffering

Buffer client requests before forwarding to reduce slow client impact.

Compression

Compress responses to reduce bandwidth usage.

WAF Integration

Integrate Web Application Firewall for security.

Rate Limiting

Limit requests per client to prevent abuse.
rate_limits:
  - path: /api/*
    limit: 1000/minute
    by: client_ip
  - path: /api/heavy
    limit: 10/minute
    by: api_key

Key Takeaways

Load balancers are essential for high-availability, scalable systems. Choose the right type and algorithm based on your specific requirements.
  • Load balancers distribute traffic across multiple servers for reliability and performance
  • Layer 4 (NLB) is faster; Layer 7 (ALB) provides richer routing capabilities
  • Static algorithms (round-robin) work for uniform workloads
  • Dynamic algorithms (least connections) adapt to varying loads
  • Health monitoring ensures traffic goes only to healthy servers
  • SSL termination offloads encryption work from backend servers
  • Often used in combination with API gateways for complete traffic management

Build docs developers (and LLMs) love