Skip to main content
workerd performs well out of the box, but there’s significant headroom for optimization. This guide covers compilation flags, runtime configuration, and application-level tuning.
Performance tuning is an ongoing effort in workerd. Many optimizations mentioned here represent low-hanging fruit that can dramatically improve performance.

Build optimizations

Release builds

Always use release builds for production:
# Standard release build
bazel build //src/workerd/server:workerd --config=release
Release mode enables:
  • Compiler optimizations (-O2)
  • Dead code elimination
  • Inlining
  • NDEBUG mode (disables assertions)
For maximum performance, use thin LTO:
bazel build --config=thin-lto //src/workerd/server:workerd
LTO benefits:
  • ~2x performance improvement on simple benchmarks
  • Better cross-module optimization
  • Smaller binary size
  • Longer compile times
Experiments suggest workerd can roughly double performance on “hello world” benchmarks with LTO and memory allocator tuning.

Memory allocator

workerd uses tcmalloc by default, which is already optimized for allocation-heavy workloads. Alternative allocators:
  • tcmalloc: Default, excellent for workerd’s usage patterns
  • jemalloc: May perform better for specific workloads
  • mimalloc: Lightweight alternative
To benchmark allocators, link against different allocator libraries at build time.

Runtime configuration

Multi-process deployment

workerd is single-threaded. Utilize all CPU cores by running multiple instances:
# Run one instance per CPU core
for i in {0..7}; do
  workerd serve config.capnp --socket-fd http=$((3+i)) &
done
See systemd deployment for production-ready multi-instance setup.

Connection pooling

For outbound requests, workerd automatically pools connections. Tune limits:
const config :Workerd.Config = (
  # Global connection limits
  # Note: These limits are per-process
);

Cache configuration

For Durable Objects, tune the storage cache:
// Limit cache usage to prevent memory exhaustion
const options = {
  softLimit: 16 * 1024 * 1024,      // 16 MiB
  hardLimit: 32 * 1024 * 1024,      // 32 MiB  
  dirtyListByteLimit: 4 * 1024 * 1024, // 4 MiB
};
See Durable Objects storage internals for details.

Application-level optimizations

Minimize cold starts

Pre-warm isolates

workerd creates V8 isolates on-demand. For predictable performance:
// Heavy initialization at module level
import crypto from 'crypto';
import { expensive } from './utils';

const precomputedData = expensive();

export default {
  async fetch(request) {
    // Fast path uses pre-computed data
    return new Response(precomputedData);
  }
};

Avoid dynamic imports

// ❌ Slow: Dynamic import on every request
export default {
  async fetch(request) {
    const { handler } = await import('./handler');
    return handler(request);
  }
};

// ✅ Fast: Static import
import { handler } from './handler';

export default {
  async fetch(request) {
    return handler(request);
  }
};

Optimize request handling

Batch operations

// ❌ Slow: Sequential requests
const results = [];
for (const url of urls) {
  const response = await fetch(url);
  results.push(await response.json());
}

// ✅ Fast: Parallel requests
const responses = await Promise.all(
  urls.map(url => fetch(url))
);
const results = await Promise.all(
  responses.map(r => r.json())
);

Stream when possible

// ✅ Stream large responses
export default {
  async fetch(request) {
    const upstream = await fetch('https://example.com/large-file');
    
    // Stream through without buffering
    return new Response(upstream.body, {
      headers: upstream.headers
    });
  }
};

Minimize allocations

Reuse objects

// ❌ Creates new object on every request
const headers = new Headers();
headers.set('X-Custom', 'value');

// ✅ Reuse immutable objects
const COMMON_HEADERS = new Headers({
  'X-Custom': 'value',
  'Content-Type': 'application/json'
});

export default {
  async fetch(request) {
    return new Response(data, { headers: COMMON_HEADERS });
  }
};

Avoid string concatenation in hot paths

// ❌ Slow: String concatenation
let html = '<html>';
html += '<body>';
html += '<h1>' + title + '</h1>';
html += '</body></html>';

// ✅ Fast: Template literals or array join
const html = `<html><body><h1>${title}</h1></body></html>`;

// ✅ Also fast: Array join for dynamic lists
const parts = ['<html><body>'];
for (const item of items) {
  parts.push(`<li>${item}</li>`);
}
parts.push('</body></html>');
const html = parts.join('');

Storage optimizations

For Durable Objects:

Batch reads and writes

// ❌ Slow: Individual operations
await storage.put('key1', 'value1');
await storage.put('key2', 'value2');
await storage.put('key3', 'value3');

// ✅ Fast: Batch operation
await storage.put({
  key1: 'value1',
  key2: 'value2',
  key3: 'value3',
});

Use transactions

// ✅ Atomic batch with transaction
await storage.transaction(async (txn) => {
  const current = await txn.get('counter');
  await txn.put('counter', current + 1);
  await txn.put('lastUpdate', Date.now());
  // Single commit for all operations
});

Limit list() operations

// ❌ Can exhaust cache
const all = await storage.list();

// ✅ Use pagination
const page1 = await storage.list({ limit: 1000 });
const page2 = await storage.list({ 
  start: page1.cursor,
  limit: 1000 
});

WebSocket optimizations

Use hibernation for idle connections

class ChatRoom {
  async webSocketMessage(ws, message) {
    // Process message
    this.broadcast(message);
    
    // Allow hibernation for idle connections
    ws.enableHibernation();
  }
}

Batch broadcasts

// ❌ Slow: Send to each connection individually
for (const ws of connections) {
  ws.send(message);
}

// ✅ Fast: Use acceptWebSocket with hibernation
// workerd can batch message delivery
this.broadcast(message); // Optimized internally

Monitoring and profiling

Request timing

Add timing headers:
export default {
  async fetch(request) {
    const start = Date.now();
    
    const response = await handleRequest(request);
    
    const duration = Date.now() - start;
    response.headers.set('X-Response-Time', `${duration}ms`);
    
    return response;
  }
};

CPU profiling

workerd supports V8 CPU profiling:
# Enable CPU profiling
workerd serve config.capnp --experimental-enable-cpu-profiler
CPU profiling adds overhead. Only enable for debugging, not in production.

Memory profiling

Monitor memory usage:
// Check memory usage
const used = performance.memory?.usedJSHeapSize;
const limit = performance.memory?.jsHeapSizeLimit;

if (used / limit > 0.8) {
  console.warn('Memory usage high:', used / 1024 / 1024, 'MB');
}

Benchmarking

Load testing

Use wrk or ab for load testing:
# wrk - 10 threads, 100 connections, 30 seconds
wrk -t10 -c100 -d30s http://localhost:8080/

# Apache Bench - 10000 requests, 100 concurrent
ab -n 10000 -c 100 http://localhost:8080/

Baseline performance

Simple “Hello World” performance (with LTO):
  • Throughput: ~100k req/s per core (single instance)
  • Latency p50: <1ms
  • Latency p99: <5ms
Your mileage will vary based on:
  • Application complexity
  • I/O operations
  • System resources

Platform-specific optimizations

Linux

Increase file descriptor limits

# In systemd service file
LimitNOFILE=65536

# Or system-wide in /etc/security/limits.conf
workerd soft nofile 65536
workerd hard nofile 65536

Use hugepages

For workloads with large memory footprints:
# Enable transparent hugepages
echo always > /sys/kernel/mm/transparent_hugepage/enabled

TCP tuning

For high connection counts:
# Increase TCP buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 16777216"

macOS

Increase file descriptor limits:
# Check current limits
ulimit -n

# Increase (temporary)
ulimit -n 65536

# Permanent: Add to ~/.zshrc or ~/.bashrc
ulimit -n 65536

V8 tuning

Heap size limits

workerd uses V8’s default heap limits. For memory-intensive applications:
# Increase V8 heap size via workerd flags
# Note: This is per-isolate
workerd serve config.capnp --v8-max-heap-size=512

Optimization tier

V8 has multiple optimization tiers (Ignition, TurboFan). Hot functions automatically get optimized. To force optimization (testing only):
// Force function optimization (V8 internal)
// Don't use in production!
function hotPath(data) {
  // Your hot path code
}

// Warm up the function
for (let i = 0; i < 10000; i++) {
  hotPath(testData);
}

Anti-patterns

Avoid these performance pitfalls:

❌ Synchronous crypto

// Blocks the event loop
const hash = crypto.createHash('sha256')
  .update(largeData)
  .digest('hex');

// ✅ Use async crypto when available
const hash = await crypto.subtle.digest('SHA-256', largeData);

❌ Large JSON.parse/stringify

// Blocks event loop for large objects
const data = JSON.parse(hugeJsonString);

// ✅ Stream or chunk large data
const stream = response.body;
const reader = stream.getReader();
// Process in chunks

❌ Inefficient regex

// Catastrophic backtracking
const regex = /^(a+)+$/;

// ✅ Use specific, bounded patterns
const regex = /^a{1,100}$/;

❌ Memory leaks

// Global array grows forever
const cache = [];
export default {
  fetch(request) {
    cache.push(request.url); // ❌ Memory leak
  }
};

// ✅ Use bounded cache with eviction
const cache = new LRUCache({ max: 1000 });

Performance checklist

  • Using --config=thin-lto for production builds
  • Running one workerd instance per CPU core
  • Module-level imports (no dynamic imports in hot path)
  • Batching I/O operations where possible
  • Streaming large responses instead of buffering
  • Using transactions for multi-step storage operations
  • Limiting list() operations with pagination
  • Monitoring memory usage and setting appropriate limits
  • Load testing under realistic conditions
  • Profiling hot paths and optimizing bottlenecks

Further reading

Build docs developers (and LLMs) love