Backoff Strategies

What is Backoff?

Backoff is the practice of waiting between retry attempts. Instead of immediately retrying a failed operation, backoff introduces a delay that gives the failing system time to recover. Without backoff, rapid retries can:

Overwhelm struggling services: Making recovery harder or impossible
Waste resources: CPU and network bandwidth on futile attempts
Trigger rate limits: Aggressive retries can look like abuse
Amplify outages: Thundering herd problems when many clients retry simultaneously

Never retry without backoff in production systems. It can turn a small issue into a catastrophic outage.

Backoff Strategies in Resilience

Resilience supports two backoff strategies, defined in src/global.d.ts:15-17:

type BackoffStrategy =
    | { type: "fixed"; delayMs: number }
    | { type: "exponential"; baseDelayMs: number; maxDelayMs: number; jitter?: boolean };

Fixed Backoff

Fixed backoff waits the same amount of time between each retry attempt. When to use:

Simple, predictable retry patterns
When you know the recovery time of the downstream service
Low retry counts (1-3 retries)
Testing and debugging (easier to reason about)

Configuration:

import { withResilience } from '@oldwhisper/resilience';

const resilient = withResilience(task, {
  retries: 3,
  backoff: {
    type: 'fixed',
    delayMs: 1000  // Wait 1 second between each retry
  }
});

// Retry timeline:
// Attempt 1: immediate
// Attempt 2: after 1s
// Attempt 3: after 1s
// Attempt 4: after 1s

Implementation (from src/index.ts:72-74):

function computeBackoffMs(strategy: Resilience.BackoffStrategy | undefined, attempt: number): number {
    if (!strategy) return 0;
    if (strategy.type === "fixed") return strategy.delayMs;
    // ...
}

Exponential Backoff

Exponential backoff doubles the wait time with each retry, up to a maximum delay. When to use:

Production systems with high retry counts
Unknown or variable recovery times
Preventing thundering herd problems
Services that may need progressively longer recovery time

Configuration:

const resilient = withResilience(task, {
  retries: 5,
  backoff: {
    type: 'exponential',
    baseDelayMs: 100,    // Start with 100ms
    maxDelayMs: 10000,   // Cap at 10 seconds
    jitter: true         // Add randomization
  }
});

// Retry timeline (without jitter):
// Attempt 1: immediate
// Attempt 2: after 100ms   (100 * 2^0)
// Attempt 3: after 200ms   (100 * 2^1)
// Attempt 4: after 400ms   (100 * 2^2)
// Attempt 5: after 800ms   (100 * 2^3)
// Attempt 6: after 1600ms  (100 * 2^4)

Implementation (from src/index.ts:76-80):

const raw = strategy.baseDelayMs * Math.pow(2, Math.max(0, attempt - 1));
const capped = Math.min(raw, strategy.maxDelayMs);
if (!strategy.jitter) return capped;

return Math.floor(Math.random() * capped);

Key details:

Formula: baseDelayMs * 2^(attempt - 1)
Always capped at maxDelayMs to prevent excessive waits
Attempt counting starts at 1 (first retry is attempt 1)

Jitter: Breaking Synchronization

Jitter adds randomness to backoff delays. This is crucial for preventing thundering herd problems where many clients retry simultaneously.

The Thundering Herd Problem

Imagine 1000 clients all experience a failure at the same time: Without jitter:

Time 0s:    1000 requests → all fail
Time 1s:    1000 retries  → all fail (server still overloaded)
Time 3s:    1000 retries  → all fail (synchronized retry storm)
Time 7s:    1000 retries  → all fail (still synchronized)

With jitter:

Time 0s:      1000 requests → all fail
Time 0-1s:    ~1000 retries spread over 1 second
Time 0-3s:    ~1000 retries spread over 3 seconds  
Time 0-7s:    ~1000 retries spread over 7 seconds

Jitter Implementation

From src/index.ts:78-80:

if (!strategy.jitter) return capped;

return Math.floor(Math.random() * capped);

When jitter is enabled, the delay is randomized between 0 and capped, creating a uniform distribution. Example with jitter:

const resilient = withResilience(task, {
  retries: 4,
  backoff: {
    type: 'exponential',
    baseDelayMs: 1000,
    maxDelayMs: 16000,
    jitter: true
  }
});

// Retry delays (example - actual values are random):
// Attempt 2: 534ms   (random between 0-1000ms)
// Attempt 3: 1847ms  (random between 0-2000ms)
// Attempt 4: 2103ms  (random between 0-4000ms)
// Attempt 5: 6891ms  (random between 0-8000ms)

Always enable jitter in production for exponential backoff. It significantly reduces load spikes during outages.

Backoff in the Retry Loop

Backoff is applied after a retry is decided but before the next attempt (from src/index.ts:157-162):

const shouldRetry = attempt <= retries && retryOn(err);
if (!shouldRetry) throw err;

const waitMs = computeBackoffMs(config.backoff, attempt);
hooks?.onRetry?.({ name, attempt, delayMs: waitMs, error: err });
if (waitMs > 0) await delay(waitMs);

The delay helper is a simple promise-based sleep (from src/index.ts:86-88):

function delay(ms: number) {
    return new Promise<void>((resolve) => setTimeout(resolve, ms));
}

Complete Examples

import { withResilience } from '@oldwhisper/resilience';

const fetchData = async () => {
  const response = await fetch('https://api.example.com/data');
  if (!response.ok) throw new Error(`HTTP ${response.status}`);
  return response.json();
};

const resilient = withResilience(fetchData, {
  name: 'fetchData',
  retries: 3,
  backoff: {
    type: 'fixed',
    delayMs: 2000  // Wait 2 seconds between retries
  },
  hooks: {
    onRetry: ({ attempt, delayMs }) => {
      console.log(`Retrying attempt ${attempt + 1} after ${delayMs}ms delay`);
    }
  }
});

// Total possible time: up to 6 seconds of backoff (3 retries × 2s each)
// Plus the time for each attempt itself
await resilient();

Choosing the Right Strategy

Use Fixed Backoff When:

You have 1-3 retries only
The service has predictable recovery time
You’re testing or debugging
Simplicity is more important than optimization

Use Exponential Backoff When:

You have 4+ retries
Recovery time is unknown or variable
You’re building production systems
You need to handle thundering herd scenarios
You want to progressively back off from a struggling service

For most production use cases, exponential backoff with jitter is the recommended approach.

Monitoring Backoff Behavior

Use hooks to track actual backoff delays:

const backoffMetrics = {
  totalDelayMs: 0,
  retryCount: 0
};

const resilient = withResilience(task, {
  retries: 5,
  backoff: {
    type: 'exponential',
    baseDelayMs: 100,
    maxDelayMs: 10000,
    jitter: true
  },
  hooks: {
    onRetry: ({ delayMs }) => {
      backoffMetrics.totalDelayMs += delayMs;
      backoffMetrics.retryCount++;
      console.log(`Cumulative backoff: ${backoffMetrics.totalDelayMs}ms over ${backoffMetrics.retryCount} retries`);
    }
  }
});

Best Practices

Always use backoff with retries: Never retry without at least a small delay
Enable jitter in production: Prevents synchronized retry storms
Set reasonable maximums: maxDelayMs prevents excessive wait times
Start small: Begin with short baseDelayMs (100-500ms)
Consider total time: Account for (retries × average_delay) + (retries × timeout) in your SLAs

Retries - The retry mechanism that backoff enhances
Timeouts - Each retry attempt can have its own timeout
Circuit Breakers - Stop retrying when failures become systemic

Get Started

Core Concepts

API Reference

Guides

Examples

What is Backoff?

Backoff Strategies in Resilience

Fixed Backoff

Exponential Backoff

Jitter: Breaking Synchronization

The Thundering Herd Problem

Jitter Implementation

Backoff in the Retry Loop

Complete Examples

Choosing the Right Strategy

Use Fixed Backoff When:

Use Exponential Backoff When:

Monitoring Backoff Behavior

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

API Reference

Guides

Examples

​What is Backoff?

​Backoff Strategies in Resilience

​Fixed Backoff

​Exponential Backoff

​Jitter: Breaking Synchronization

​The Thundering Herd Problem

​Jitter Implementation

​Backoff in the Retry Loop

​Complete Examples

​Choosing the Right Strategy

​Use Fixed Backoff When:

​Use Exponential Backoff When:

​Monitoring Backoff Behavior

​Best Practices

​Related Concepts

Build docs developers (and LLMs) love

What is Backoff?

Backoff Strategies in Resilience

Fixed Backoff

Exponential Backoff

Jitter: Breaking Synchronization

The Thundering Herd Problem

Jitter Implementation

Backoff in the Retry Loop

Complete Examples

Choosing the Right Strategy

Use Fixed Backoff When:

Use Exponential Backoff When:

Monitoring Backoff Behavior

Best Practices

Related Concepts