Skip to main content
Load balancing is essential for distributing high-concurrency user requests across multiple application servers, enabling horizontal scaling and improved system reliability.

What is Load Balancing?

Load balancing distributes incoming network traffic across a cluster of servers to:
  • Maximize throughput: Utilize multiple servers’ computing resources
  • Minimize response time: Route requests to available servers
  • Avoid overload: Prevent any single server from becoming a bottleneck
  • Ensure high availability: Continue serving requests even if servers fail

Load Balancing Approaches

HTTP Redirect Load Balancing

Mechanism: Load balancer returns HTTP 302 redirect to selected application server

How It Works

1

Initial request

User sends HTTP request to load balancer
2

Server selection

Load balancer selects target server using algorithm (random, round-robin, etc.)
3

Redirect response

Returns HTTP 302 with application server IP address
4

Direct connection

Browser sends new request directly to application server

Simple Implementation

@Override
protected void doGet(HttpServletRequest request, 
                    HttpServletResponse response) 
                    throws ServletException, IOException {
    // Get client request URL
    String clientRequestURL = request.getRequestURL().toString();
    
    // Select target server based on condition
    String targetURL;
    if (someCondition()) {
        targetURL = "http://server1.example.com" + request.getServletPath();
    } else {
        targetURL = "http://server2.example.com" + request.getServletPath();
    }
    
    // Execute redirect
    response.sendRedirect(targetURL);
}

Simplicity vs. Practicality

This approach can be implemented in less than 10 lines of Java code, making it extremely simple. However, it’s rarely used in production due to significant drawbacks.

Advantages

Simple Design

Easy to implement with minimal code

No Proxy Overhead

Load balancer doesn’t handle response traffic

Disadvantages

Critical Issues:
  1. Double Request Overhead
    • User makes TWO requests per operation
    • First to load balancer, then to application server
    • Doubles latency and network overhead
  2. Security Vulnerability
    • Application server IP addresses exposed to public
    • Direct external access to application servers
    • Cannot hide servers behind firewall
    • Increased attack surface
  3. Limited Control
    • Cannot inspect or modify response traffic
    • No SSL termination at load balancer
    • Difficult to implement sticky sessions
Industry Practice: HTTP redirect load balancing is rarely used in production. Modern systems prefer DNS load balancing combined with internal HTTP load balancers.

Load Balancing Strategies

Algorithms for selecting which server handles each request:
Pattern: Distribute requests sequentially in circular orderHow it works:
Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (cycle repeats)
Pros:
  • Simple to implement
  • Fair distribution
  • No server state required
Cons:
  • Doesn’t account for server capacity
  • Ignores current server load
  • May overload slower servers
Best for: Homogeneous server clusters with similar capacity
Pattern: Round robin with different weights per serverHow it works:
Server 1 (weight=3): Gets 3 out of every 6 requests
Server 2 (weight=2): Gets 2 out of every 6 requests
Server 3 (weight=1): Gets 1 out of every 6 requests
Configuration example:
upstream backend {
    server app1.example.com weight=3;  # Powerful server
    server app2.example.com weight=2;  # Medium server
    server app3.example.com weight=1;  # Smaller server
}
Pros:
  • Accounts for different server capacities
  • Better resource utilization
  • Flexible configuration
Best for: Heterogeneous clusters with varying server specifications
Pattern: Randomly select server for each requestHow it works:
import random

def select_server(servers):
    return random.choice(servers)
Pros:
  • Simple implementation
  • Stateless
  • Naturally distributes over time
Cons:
  • Uneven distribution in short term
  • No capacity awareness
Best for: Large request volumes where statistical distribution evens out
Pattern: Route to server with fewest active connectionsHow it works:
Server 1: 10 active connections
Server 2: 15 active connections ← Skip
Server 3: 8 active connections  ← Choose this
Pros:
  • Dynamic load awareness
  • Better for long-lived connections
  • Adapts to varying request durations
Cons:
  • Requires connection tracking
  • More complex state management
Best for: Applications with variable request processing timesNacos implementation: Supports “Least Connections with Slow Start” variant
Pattern: Hash client IP to consistently route to same serverHow it works:
def select_server(client_ip, servers):
    hash_value = hash(client_ip)
    index = hash_value % len(servers)
    return servers[index]
Pros:
  • Session affinity without cookies
  • Consistent routing per client
  • Simplified session management
Cons:
  • Uneven distribution if client IPs cluster
  • Server changes require rehashing
  • Not suitable behind proxies/NAT
Best for: Stateful applications requiring session persistence
Pattern: Route to server with fastest response timeHow it works:
  • Track average response time per server
  • Send new requests to fastest server
  • Continuously update metrics
Pros:
  • Performance-aware routing
  • Automatically avoids slow servers
  • Optimizes user experience
Cons:
  • Complex metric collection
  • Requires health monitoring
  • Can create hot spots
Best for: Distributed servers across geographic regions

Comparison Matrix

StrategyComplexityState RequiredDistributionUse Case
Round RobinLowNoneEvenHomogeneous clusters
Weighted RRLowWeights onlyProportionalHeterogeneous clusters
RandomVery LowNoneStatisticalHigh-volume traffic
Least ConnectionsMediumConnection countsDynamicVariable request times
IP HashMediumHash tableIP-basedSession persistence
Least Response TimeHighMetrics + healthPerformance-basedGeographic distribution

Design Considerations

Health Checks

Essential for reliability:
  • Active health probes
  • Passive failure detection
  • Automatic server removal
  • Graceful re-introduction
Example (Nginx):
server 192.168.1.10:8080 
    max_fails=3 
    fail_timeout=30s;

Session Persistence

Maintain user sessions:
  • Sticky sessions (cookie-based)
  • IP hash routing
  • Shared session storage (Redis)
  • Stateless design (JWT)
Trade-off: Stickiness vs. flexibility

SSL Termination

Handle encryption at load balancer:
  • Reduce backend server load
  • Centralized certificate management
  • Simpler backend configuration
  • May decrypt sensitive data
Alternative: End-to-end encryption

High Availability

Eliminate single point of failure:
  • Active-passive LB pairs
  • Active-active with shared VIP
  • DNS-level LB failover
  • Health check redundancy
Technologies: Keepalived, VRRP, BGP

Common Patterns

Multi-Tier Geographic Load Balancing

Benefits:
  • Reduced latency (geographic proximity)
  • Regulatory compliance (data residency)
  • Disaster recovery across regions

Best Practices

1

Start with DNS load balancing

Distribute traffic across load balancer clusters geographically
2

Use reverse proxy for application tier

Nginx/HAProxy to distribute to application servers with private IPs
3

Implement health checks

Both active probes and passive failure detection
4

Choose appropriate algorithm

Match strategy to your traffic patterns and server characteristics
5

Plan for high availability

Redundant load balancers with automatic failover
6

Monitor and tune

Collect metrics, identify bottlenecks, adjust configuration
Common Mistakes to Avoid:
  • Exposing application servers to internet (use internal IPs)
  • Single load balancer (creates single point of failure)
  • No health checks (routes to failed servers)
  • Wrong algorithm for workload (e.g., round-robin for stateful apps)
  • Insufficient monitoring (can’t diagnose issues)

Service Discovery

Dynamic service registration for automatic load balancer updates

Message Queues

Asynchronous load distribution through queuing

Build docs developers (and LLMs) love