Performance tuning guide

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Build optimizations
Release builds
Link-time optimization (LTO)
Memory allocator
Runtime configuration
Multi-process deployment
Connection pooling
Cache configuration
Application-level optimizations
Minimize cold starts
Pre-warm isolates
Avoid dynamic imports
Optimize request handling
Batch operations
Stream when possible
Minimize allocations
Reuse objects
Avoid string concatenation in hot paths
Storage optimizations
Batch reads and writes
Use transactions
Limit list() operations
WebSocket optimizations
Use hibernation for idle connections
Batch broadcasts
Monitoring and profiling
Request timing
CPU profiling
Memory profiling
Benchmarking
Load testing
Baseline performance
Platform-specific optimizations
Linux
Increase file descriptor limits
Use hugepages
TCP tuning
macOS
V8 tuning
Heap size limits
Optimization tier
Anti-patterns
❌ Synchronous crypto
❌ Large JSON.parse/stringify
❌ Inefficient regex
❌ Memory leaks
Performance checklist
Further reading

workerd performs well out of the box, but there’s significant headroom for optimization. This guide covers compilation flags, runtime configuration, and application-level tuning.

Performance tuning is an ongoing effort in workerd. Many optimizations mentioned here represent low-hanging fruit that can dramatically improve performance.

Build optimizations

Release builds

Always use release builds for production:

# Standard release build
bazel build //src/workerd/server:workerd --config=release

Release mode enables:

Compiler optimizations (-O2)
Dead code elimination
Inlining
NDEBUG mode (disables assertions)

Link-time optimization (LTO)

For maximum performance, use thin LTO:

bazel build --config=thin-lto //src/workerd/server:workerd

LTO benefits:

~2x performance improvement on simple benchmarks
Better cross-module optimization
Smaller binary size
Longer compile times

Experiments suggest workerd can roughly double performance on “hello world” benchmarks with LTO and memory allocator tuning.

Memory allocator

workerd uses tcmalloc by default, which is already optimized for allocation-heavy workloads. Alternative allocators:

tcmalloc: Default, excellent for workerd’s usage patterns
jemalloc: May perform better for specific workloads
mimalloc: Lightweight alternative

To benchmark allocators, link against different allocator libraries at build time.

Runtime configuration

Multi-process deployment

workerd is single-threaded. Utilize all CPU cores by running multiple instances:

# Run one instance per CPU core
for i in {0..7}; do
  workerd serve config.capnp --socket-fd http=$((3+i)) &
done

See systemd deployment for production-ready multi-instance setup.

Connection pooling

For outbound requests, workerd automatically pools connections. Tune limits:

const config :Workerd.Config = (
  # Global connection limits
  # Note: These limits are per-process
);

Cache configuration

For Durable Objects, tune the storage cache:

// Limit cache usage to prevent memory exhaustion
const options = {
  softLimit: 16 * 1024 * 1024,      // 16 MiB
  hardLimit: 32 * 1024 * 1024,      // 32 MiB  
  dirtyListByteLimit: 4 * 1024 * 1024, // 4 MiB
};

See Durable Objects storage internals for details.

Application-level optimizations

Minimize cold starts

Pre-warm isolates

workerd creates V8 isolates on-demand. For predictable performance:

// Heavy initialization at module level
import crypto from 'crypto';
import { expensive } from './utils';

const precomputedData = expensive();

export default {
  async fetch(request) {
    // Fast path uses pre-computed data
    return new Response(precomputedData);
  }
};

Avoid dynamic imports

// ❌ Slow: Dynamic import on every request
export default {
  async fetch(request) {
    const { handler } = await import('./handler');
    return handler(request);
  }
};

// ✅ Fast: Static import
import { handler } from './handler';

export default {
  async fetch(request) {
    return handler(request);
  }
};

Optimize request handling

Batch operations

// ❌ Slow: Sequential requests
const results = [];
for (const url of urls) {
  const response = await fetch(url);
  results.push(await response.json());
}

// ✅ Fast: Parallel requests
const responses = await Promise.all(
  urls.map(url => fetch(url))
);
const results = await Promise.all(
  responses.map(r => r.json())
);

Stream when possible

// ✅ Stream large responses
export default {
  async fetch(request) {
    const upstream = await fetch('https://example.com/large-file');
    
    // Stream through without buffering
    return new Response(upstream.body, {
      headers: upstream.headers
    });
  }
};

Minimize allocations

Reuse objects

// ❌ Creates new object on every request
const headers = new Headers();
headers.set('X-Custom', 'value');

// ✅ Reuse immutable objects
const COMMON_HEADERS = new Headers({
  'X-Custom': 'value',
  'Content-Type': 'application/json'
});

export default {
  async fetch(request) {
    return new Response(data, { headers: COMMON_HEADERS });
  }
};

Avoid string concatenation in hot paths

// ❌ Slow: String concatenation
let html = '<html>';
html += '<body>';
html += '<h1>' + title + '</h1>';
html += '</body></html>';

// ✅ Fast: Template literals or array join
const html = `<html><body><h1>${title}</h1></body></html>`;

// ✅ Also fast: Array join for dynamic lists
const parts = ['<html><body>'];
for (const item of items) {
  parts.push(`<li>${item}</li>`);
}
parts.push('</body></html>');
const html = parts.join('');

Storage optimizations

For Durable Objects:

Batch reads and writes

// ❌ Slow: Individual operations
await storage.put('key1', 'value1');
await storage.put('key2', 'value2');
await storage.put('key3', 'value3');

// ✅ Fast: Batch operation
await storage.put({
  key1: 'value1',
  key2: 'value2',
  key3: 'value3',
});

Use transactions

// ✅ Atomic batch with transaction
await storage.transaction(async (txn) => {
  const current = await txn.get('counter');
  await txn.put('counter', current + 1);
  await txn.put('lastUpdate', Date.now());
  // Single commit for all operations
});

Limit list() operations

// ❌ Can exhaust cache
const all = await storage.list();

// ✅ Use pagination
const page1 = await storage.list({ limit: 1000 });
const page2 = await storage.list({ 
  start: page1.cursor,
  limit: 1000 
});

WebSocket optimizations

Use hibernation for idle connections

class ChatRoom {
  async webSocketMessage(ws, message) {
    // Process message
    this.broadcast(message);
    
    // Allow hibernation for idle connections
    ws.enableHibernation();
  }
}

Batch broadcasts

// ❌ Slow: Send to each connection individually
for (const ws of connections) {
  ws.send(message);
}

// ✅ Fast: Use acceptWebSocket with hibernation
// workerd can batch message delivery
this.broadcast(message); // Optimized internally

Monitoring and profiling

Request timing

Add timing headers:

export default {
  async fetch(request) {
    const start = Date.now();
    
    const response = await handleRequest(request);
    
    const duration = Date.now() - start;
    response.headers.set('X-Response-Time', `${duration}ms`);
    
    return response;
  }
};

CPU profiling

workerd supports V8 CPU profiling:

# Enable CPU profiling
workerd serve config.capnp --experimental-enable-cpu-profiler

CPU profiling adds overhead. Only enable for debugging, not in production.

Memory profiling

Monitor memory usage:

// Check memory usage
const used = performance.memory?.usedJSHeapSize;
const limit = performance.memory?.jsHeapSizeLimit;

if (used / limit > 0.8) {
  console.warn('Memory usage high:', used / 1024 / 1024, 'MB');
}

Benchmarking

Load testing

Use wrk or ab for load testing:

# wrk - 10 threads, 100 connections, 30 seconds
wrk -t10 -c100 -d30s http://localhost:8080/

# Apache Bench - 10000 requests, 100 concurrent
ab -n 10000 -c 100 http://localhost:8080/

Baseline performance

Simple “Hello World” performance (with LTO):

Throughput: ~100k req/s per core (single instance)
Latency p50: <1ms
Latency p99: <5ms

Your mileage will vary based on:

Application complexity
I/O operations
System resources

Platform-specific optimizations

Linux

Increase file descriptor limits

# In systemd service file
LimitNOFILE=65536

# Or system-wide in /etc/security/limits.conf
workerd soft nofile 65536
workerd hard nofile 65536

Use hugepages

For workloads with large memory footprints:

# Enable transparent hugepages
echo always > /sys/kernel/mm/transparent_hugepage/enabled

TCP tuning

For high connection counts:

# Increase TCP buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 16777216"

macOS

Increase file descriptor limits:

# Check current limits
ulimit -n

# Increase (temporary)
ulimit -n 65536

# Permanent: Add to ~/.zshrc or ~/.bashrc
ulimit -n 65536

V8 tuning

Heap size limits

workerd uses V8’s default heap limits. For memory-intensive applications:

# Increase V8 heap size via workerd flags
# Note: This is per-isolate
workerd serve config.capnp --v8-max-heap-size=512

Optimization tier

V8 has multiple optimization tiers (Ignition, TurboFan). Hot functions automatically get optimized. To force optimization (testing only):

// Force function optimization (V8 internal)
// Don't use in production!
function hotPath(data) {
  // Your hot path code
}

// Warm up the function
for (let i = 0; i < 10000; i++) {
  hotPath(testData);
}

Anti-patterns

Avoid these performance pitfalls:

❌ Synchronous crypto

// Blocks the event loop
const hash = crypto.createHash('sha256')
  .update(largeData)
  .digest('hex');

// ✅ Use async crypto when available
const hash = await crypto.subtle.digest('SHA-256', largeData);

❌ Large JSON.parse/stringify

// Blocks event loop for large objects
const data = JSON.parse(hugeJsonString);

// ✅ Stream or chunk large data
const stream = response.body;
const reader = stream.getReader();
// Process in chunks

❌ Inefficient regex

// Catastrophic backtracking
const regex = /^(a+)+$/;

// ✅ Use specific, bounded patterns
const regex = /^a{1,100}$/;

❌ Memory leaks

// Global array grows forever
const cache = [];
export default {
  fetch(request) {
    cache.push(request.url); // ❌ Memory leak
  }
};

// ✅ Use bounded cache with eviction
const cache = new LRUCache({ max: 1000 });

Performance checklist

Build docs developers (and LLMs) love

Get started for free Talk to us

Getting Started

Core Concepts

Configuration

Runtime APIs

Node.js Compatibility

Advanced

Development

​Build optimizations

​Release builds

​Link-time optimization (LTO)

​Memory allocator

​Runtime configuration

​Multi-process deployment

​Connection pooling

​Cache configuration

​Application-level optimizations

​Minimize cold starts

​Pre-warm isolates

​Avoid dynamic imports

​Optimize request handling

​Batch operations

​Stream when possible

​Minimize allocations

​Reuse objects

​Avoid string concatenation in hot paths

​Storage optimizations

​Batch reads and writes

​Use transactions

​Limit list() operations

​WebSocket optimizations

​Use hibernation for idle connections

​Batch broadcasts

​Monitoring and profiling

​Request timing

​CPU profiling

​Memory profiling

​Benchmarking

​Load testing

​Baseline performance

​Platform-specific optimizations

​Linux

​Increase file descriptor limits

​Use hugepages

​TCP tuning

​macOS

​V8 tuning

​Heap size limits

​Optimization tier

​Anti-patterns

​❌ Synchronous crypto

​❌ Large JSON.parse/stringify

​❌ Inefficient regex

​❌ Memory leaks

​Performance checklist

​Further reading

Build docs developers (and LLMs) love

Build optimizations

Release builds

Link-time optimization (LTO)

Memory allocator

Runtime configuration

Multi-process deployment

Connection pooling

Cache configuration

Application-level optimizations

Minimize cold starts

Pre-warm isolates

Avoid dynamic imports

Optimize request handling

Batch operations

Stream when possible

Minimize allocations

Reuse objects

Avoid string concatenation in hot paths

Storage optimizations

Batch reads and writes

Use transactions

Limit list() operations

WebSocket optimizations

Use hibernation for idle connections

Batch broadcasts

Monitoring and profiling

Request timing

CPU profiling

Memory profiling

Benchmarking

Load testing

Baseline performance

Platform-specific optimizations

Linux

Increase file descriptor limits

Use hugepages

TCP tuning

macOS

V8 tuning

Heap size limits

Optimization tier

Anti-patterns

❌ Synchronous crypto

❌ Large JSON.parse/stringify

❌ Inefficient regex

❌ Memory leaks

Performance checklist

Further reading