Skip to main content

Overview

Load balancers distribute traffic across backend servers using algorithms tuned to request duration, server capacity, and session requirements. L4 load balancers operate at the TCP layer (IP and port). L7 load balancers operate at the HTTP layer (URL, headers, cookies).

L4 Load Balancer

Layer: TCP/UDP (Transport)Routing: Based on IP and port onlyExamples: AWS NLB, HAProxy (TCP mode)Pros: Fast, protocol-agnostic, low latencyCons: No HTTP-aware routing, no SSL termination

L7 Load Balancer

Layer: HTTP/HTTPS (Application)Routing: URL path, headers, cookiesExamples: AWS ALB, Nginx, Envoy, TraefikPros: Path routing, canary deploys, SSL offloadCons: Higher latency, inspects packets
Use L7 load balancers for HTTP APIs — path-based routing, SSL termination, and health checks outweigh the minimal latency overhead.

Load Balancing Algorithms

Distribute requests equally across all servers:
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Best for:
  • Uniform request duration
  • Homogeneous server capacity
  • Stateless services
Limitations:
  • Ignores server load
  • Ineffective for long-lived connections (WebSocket)
  • No awareness of server capacity differences
# Nginx round robin (default)
upstream api {
  server 10.0.0.1:8080;
  server 10.0.0.2:8080;
  server 10.0.0.3:8080;
}

Health Checks

Load balancers must detect and remove unhealthy backends:

Active Health Checks

# Nginx health check configuration
upstream api {
  server 10.0.0.1:8080;
  server 10.0.0.2:8080;
  server 10.0.0.3:8080;
  
  # Health check
  check interval=3000 rise=2 fall=3 timeout=1000 type=http;
  check_http_send "GET /health HTTP/1.1\r\nHost: api\r\n\r\n";
  check_http_expect_alive http_2xx http_3xx;
}
Parameters:
  • interval: check every 3 seconds
  • rise: 2 consecutive successes → mark healthy
  • fall: 3 consecutive failures → mark unhealthy
  • timeout: fail check if no response in 1 second

Health Endpoint Design

// Good health endpoint
app.get('/health', async (req, res) => {
  // Check critical dependencies
  const checks = await Promise.all([
    checkDatabase(),
    checkRedis(),
    checkDownstreamAPI()
  ]);
  
  const healthy = checks.every(c => c.ok);
  
  if (healthy) {
    return res.status(200).json({ status: 'healthy', checks });
  } else {
    return res.status(503).json({ status: 'unhealthy', checks });
  }
});

async function checkDatabase() {
  try {
    await db.query('SELECT 1');
    return { name: 'database', ok: true };
  } catch (err) {
    return { name: 'database', ok: false, error: err.message };
  }
}
Health checks should be fast (under 100ms) and lightweight. Avoid complex business logic or expensive queries in health endpoints.

Kubernetes Probes

apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  containers:
  - name: api
    image: api:v1.2.3
    ports:
    - containerPort: 8080
    
    # Liveness: restart if unhealthy
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    
    # Readiness: remove from service if not ready
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 2
Liveness vs Readiness:
  • Liveness: Is the process alive? (Restart if fails)
  • Readiness: Can it serve traffic? (Remove from load balancer if fails)
Use separate /health (liveness) and /ready (readiness) endpoints. Readiness should check dependencies; liveness should only verify the process is responsive.

Connection Draining

Gracefully remove servers from rotation:
# Nginx connection draining
upstream api {
  server 10.0.0.1:8080;
  server 10.0.0.2:8080 down;  # stop sending new connections
  server 10.0.0.3:8080;
}
Process:
  1. Mark server as draining (no new connections)
  2. Wait for existing connections to complete
  3. Forcefully close connections after timeout (e.g., 30s)
  4. Remove server from pool
// Graceful shutdown in Node.js
process.on('SIGTERM', () => {
  console.log('SIGTERM received: starting graceful shutdown');
  
  // 1. Stop accepting new connections
  server.close(() => {
    console.log('HTTP server closed');
  });
  
  // 2. Wait for active requests to complete (max 30s)
  setTimeout(() => {
    console.log('Forcefully shutting down');
    process.exit(1);
  }, 30000);
});
Always enable connection draining with a 30-60 second timeout. This prevents abrupt connection termination during deployments and autoscaling events.

DNS Load Balancing

DNS Round Robin

# DNS A records (multiple IPs)
api.example.com.  60  IN  A  1.2.3.4
api.example.com.  60  IN  A  5.6.7.8
api.example.com.  60  IN  A  9.10.11.12

// Client resolves and caches one IP based on DNS response order
Limitations:
  • No health awareness (DNS doesn’t know if server is down)
  • No session affinity
  • TTL prevents fast failover
  • Client caching behavior varies
DNS round robin is not a production load balancing solution. Use dedicated load balancers (ALB, NLB) with health checks instead.

GeoDNS (Latency-Based Routing)

Route clients to nearest data center:
// Route 53 latency routing
api.example.com (latency-based):
  us-east-1: 1.2.3.4    (for US/Canada clients)
  eu-west-1: 5.6.7.8    (for Europe clients)
  ap-south-1: 9.10.11.12 (for Asia clients)
Benefits:
  • Reduced cross-region latency
  • Automatic failover to next-closest region
  • Global load distribution

DNS Failover

# Route 53 health check + failover
Primary:
  type: A
  value: 1.2.3.4
  health_check: GET https://1.2.3.4/health (every 30s)
  failover: PRIMARY

Secondary:
  type: A
  value: 5.6.7.8
  failover: SECONDARY
  
# If primary health check fails → automatically route to secondary
Lower TTL to 60 seconds at least 24 hours before any planned migration or failover. This limits client cache staleness without overwhelming DNS servers.

Content Delivery Networks (CDN)

CDNs cache content at geographically distributed PoPs (Points of Presence), serving users from the nearest edge node.

Pull CDN

CDN populates cache on first request (lazy loading):
// First request (cache miss)
User (Tokyo) → CDN PoP (Tokyo) → Origin (us-east-1)
              ← caches response  ←
              
// Subsequent requests (cache hit)
User (Tokyo) → CDN PoP (Tokyo)  [cached, no origin request]
Examples: Cloudflare, CloudFront, Fastly Best for: Unpredictable traffic, many assets

Push CDN

Proactively publish content to CDN:
# Upload large file to CDN
aws cloudfront create-invalidation \
  --distribution-id E1234567890ABC \
  --paths "/videos/large-file.mp4"
Examples: Akamai, custom CDN Best for: Large media files, predictable access patterns

Cache Control Headers

HTTP/1.1 200 OK
Cache-Control: public, s-maxage=86400, max-age=3600
Vary: Accept-Encoding
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
// Layered caching
Cache-Control: public, s-maxage=86400, max-age=3600
// CDN caches 24h (s-maxage)
// Browser caches 1h (max-age)

// User-specific content
Cache-Control: private, no-store
// CDN must NOT cache

// Immutable assets (versioned URLs)
Cache-Control: public, max-age=31536000, immutable
Use versioned filenames (app.v2.min.js) with 1-year TTLs for static assets. “Invalidate” by deploying new filenames instead of waiting for CDN TTL expiry.

CDN Invalidation

# Cloudflare purge (instant, best-effort)
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  --data '{"files":["/css/main.css","/js/app.js"]}'

# CloudFront invalidation (takes 5-15 minutes)
aws cloudfront create-invalidation \
  --distribution-id E1234567890ABC \
  --paths "/css/*" "/js/*"
CDN invalidation is best-effort and can take minutes to propagate. Use versioned URLs instead of relying on invalidation for time-sensitive updates.

Reverse Proxy Caching

Varnish Full-Page Cache

# Varnish configuration (VCL)
vcl 4.1;

backend default {
  .host = "app-server";
  .port = "8080";
}

sub vcl_recv {
  # Normalize Accept-Encoding
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } else if (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      unset req.http.Accept-Encoding;
    }
  }
  
  # Don't cache authenticated requests
  if (req.http.Authorization || req.http.Cookie ~ "session=") {
    return (pass);
  }
  
  # Only cache GET and HEAD
  if (req.method != "GET" && req.method != "HEAD") {
    return (pass);
  }
}

sub vcl_backend_response {
  # Cache for 1 hour if origin doesn't specify
  if (!beresp.http.Cache-Control) {
    set beresp.ttl = 1h;
  }
  
  # Grace period: serve stale for 10 min if backend is down
  set beresp.grace = 10m;
}

Nginx Caching

# Nginx cache configuration
proxy_cache_path /var/cache/nginx 
  levels=1:2 
  keys_zone=api_cache:10m 
  max_size=1g 
  inactive=60m;

server {
  listen 80;
  server_name api.example.com;
  
  location /api/ {
    proxy_pass http://app-servers;
    
    # Enable caching
    proxy_cache api_cache;
    proxy_cache_valid 200 10m;
    proxy_cache_valid 404 1m;
    
    # Cache key
    proxy_cache_key "$scheme$request_method$host$request_uri";
    
    # Headers
    add_header X-Cache-Status $upstream_cache_status;
  }
}

Best Practices

Long-lived connections require connection-aware routing. Round robin will eventually create imbalance.
Set timeout to 30-60 seconds to allow in-flight requests to complete gracefully during deployments.
Minimizes cache miss storms when nodes are added or removed during autoscaling events.
Many users behind NAT share one IP, creating severe imbalance. Use cookie-based sticky sessions or eliminate state.
No health awareness, slow failover due to TTL. Use L7 load balancer with active health checks instead.

Next Steps

Availability Patterns

Active-active and active-passive failover strategies

Scalability

Horizontal scaling and autoscaling patterns

Caching

Multi-layer caching strategies and invalidation

Databases

Database connection pooling and read replica routing

Build docs developers (and LLMs) love