Load balancers distribute traffic across backend servers using algorithms tuned to request duration, server capacity, and session requirements. L4 load balancers operate at the TCP layer (IP and port). L7 load balancers operate at the HTTP layer (URL, headers, cookies).
L4 Load Balancer
Layer: TCP/UDP (Transport)Routing: Based on IP and port onlyExamples: AWS NLB, HAProxy (TCP mode)Pros: Fast, protocol-agnostic, low latencyCons: No HTTP-aware routing, no SSL termination
# Nginx IP hashupstream api { ip_hash; server 10.0.0.1:8080; server 10.0.0.2:8080; server 10.0.0.3:8080;}
hash(client_ip) % server_count = server_indexClient 1.2.3.4 → hash → Server B (always)Client 5.6.7.8 → hash → Server A (always)
Pros:
Session affinity without shared state
Useful for stateful apps
Cons:
Uneven distribution with NAT (many users share one IP)
Removing server remaps many clients
Not recommended for modern stateless architectures
IP hashing is problematic with NAT and mobile networks where many users share a single IP. Use cookie-based sticky sessions or eliminate state instead.
Minimal cache remapping when nodes are added or removed:
// Traditional modulo hashserver = hash(key) % N// Adding 1 node: ~75% of keys remap (cache storm!)// Consistent hashing (hash ring)// Adding 1 node: only ~1/N of keys remap
// Consistent hash ring with virtual nodesconst ring = new ConsistentHash();ring.addNode('server-a', 150); // 150 virtual nodesring.addNode('server-b', 150);ring.addNode('server-c', 150);const server = ring.getNode('user:1001');// → 'server-b'// Add server-d: only ~25% of keys remapring.addNode('server-d', 150);
Best for:
Cache tier load balancing
Distributed caching (Memcached, Redis Cluster)
Minimizing disruption during autoscaling
Pick 2 random servers, route to less loaded one:
function p2c(servers) { // 1. Pick 2 random servers const a = servers[Math.floor(Math.random() * servers.length)]; const b = servers[Math.floor(Math.random() * servers.length)]; // 2. Route to one with fewer active connections return a.connections < b.connections ? a : b;}
Performance: Near-optimal load distribution with minimal overheadBenefits:
Simple implementation
No global state required
Scales to thousands of servers
P2C (Power of Two Choices) provides 90% of the benefit of “true” least connections with 10% of the complexity. Excellent default for internal microservices.
# Nginx health check configurationupstream api { server 10.0.0.1:8080; server 10.0.0.2:8080; server 10.0.0.3:8080; # Health check check interval=3000 rise=2 fall=3 timeout=1000 type=http; check_http_send "GET /health HTTP/1.1\r\nHost: api\r\n\r\n"; check_http_expect_alive http_2xx http_3xx;}
apiVersion: v1kind: Podmetadata: name: api-serverspec: containers: - name: api image: api:v1.2.3 ports: - containerPort: 8080 # Liveness: restart if unhealthy livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Readiness: remove from service if not ready readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 2
Liveness vs Readiness:
Liveness: Is the process alive? (Restart if fails)
Readiness: Can it serve traffic? (Remove from load balancer if fails)
Use separate /health (liveness) and /ready (readiness) endpoints. Readiness should check dependencies; liveness should only verify the process is responsive.
# DNS A records (multiple IPs)api.example.com. 60 IN A 1.2.3.4api.example.com. 60 IN A 5.6.7.8api.example.com. 60 IN A 9.10.11.12// Client resolves and caches one IP based on DNS response order
Limitations:
No health awareness (DNS doesn’t know if server is down)
No session affinity
TTL prevents fast failover
Client caching behavior varies
DNS round robin is not a production load balancing solution. Use dedicated load balancers (ALB, NLB) with health checks instead.
# Route 53 health check + failoverPrimary: type: A value: 1.2.3.4 health_check: GET https://1.2.3.4/health (every 30s) failover: PRIMARYSecondary: type: A value: 5.6.7.8 failover: SECONDARY# If primary health check fails → automatically route to secondary
Lower TTL to 60 seconds at least 24 hours before any planned migration or failover. This limits client cache staleness without overwhelming DNS servers.
Use versioned filenames (app.v2.min.js) with 1-year TTLs for static assets. “Invalidate” by deploying new filenames instead of waiting for CDN TTL expiry.