Skip to main content

What is Backoff?

Backoff is the practice of waiting between retry attempts. Instead of immediately retrying a failed operation, backoff introduces a delay that gives the failing system time to recover. Without backoff, rapid retries can:
  • Overwhelm struggling services: Making recovery harder or impossible
  • Waste resources: CPU and network bandwidth on futile attempts
  • Trigger rate limits: Aggressive retries can look like abuse
  • Amplify outages: Thundering herd problems when many clients retry simultaneously
Never retry without backoff in production systems. It can turn a small issue into a catastrophic outage.

Backoff Strategies in Resilience

Resilience supports two backoff strategies, defined in src/global.d.ts:15-17:
type BackoffStrategy =
    | { type: "fixed"; delayMs: number }
    | { type: "exponential"; baseDelayMs: number; maxDelayMs: number; jitter?: boolean };

Fixed Backoff

Fixed backoff waits the same amount of time between each retry attempt. When to use:
  • Simple, predictable retry patterns
  • When you know the recovery time of the downstream service
  • Low retry counts (1-3 retries)
  • Testing and debugging (easier to reason about)
Configuration:
import { withResilience } from '@oldwhisper/resilience';

const resilient = withResilience(task, {
  retries: 3,
  backoff: {
    type: 'fixed',
    delayMs: 1000  // Wait 1 second between each retry
  }
});

// Retry timeline:
// Attempt 1: immediate
// Attempt 2: after 1s
// Attempt 3: after 1s
// Attempt 4: after 1s
Implementation (from src/index.ts:72-74):
function computeBackoffMs(strategy: Resilience.BackoffStrategy | undefined, attempt: number): number {
    if (!strategy) return 0;
    if (strategy.type === "fixed") return strategy.delayMs;
    // ...
}

Exponential Backoff

Exponential backoff doubles the wait time with each retry, up to a maximum delay. When to use:
  • Production systems with high retry counts
  • Unknown or variable recovery times
  • Preventing thundering herd problems
  • Services that may need progressively longer recovery time
Configuration:
const resilient = withResilience(task, {
  retries: 5,
  backoff: {
    type: 'exponential',
    baseDelayMs: 100,    // Start with 100ms
    maxDelayMs: 10000,   // Cap at 10 seconds
    jitter: true         // Add randomization
  }
});

// Retry timeline (without jitter):
// Attempt 1: immediate
// Attempt 2: after 100ms   (100 * 2^0)
// Attempt 3: after 200ms   (100 * 2^1)
// Attempt 4: after 400ms   (100 * 2^2)
// Attempt 5: after 800ms   (100 * 2^3)
// Attempt 6: after 1600ms  (100 * 2^4)
Implementation (from src/index.ts:76-80):
const raw = strategy.baseDelayMs * Math.pow(2, Math.max(0, attempt - 1));
const capped = Math.min(raw, strategy.maxDelayMs);
if (!strategy.jitter) return capped;

return Math.floor(Math.random() * capped);
Key details:
  • Formula: baseDelayMs * 2^(attempt - 1)
  • Always capped at maxDelayMs to prevent excessive waits
  • Attempt counting starts at 1 (first retry is attempt 1)

Jitter: Breaking Synchronization

Jitter adds randomness to backoff delays. This is crucial for preventing thundering herd problems where many clients retry simultaneously.

The Thundering Herd Problem

Imagine 1000 clients all experience a failure at the same time: Without jitter:
Time 0s:    1000 requests → all fail
Time 1s:    1000 retries  → all fail (server still overloaded)
Time 3s:    1000 retries  → all fail (synchronized retry storm)
Time 7s:    1000 retries  → all fail (still synchronized)
With jitter:
Time 0s:      1000 requests → all fail
Time 0-1s:    ~1000 retries spread over 1 second
Time 0-3s:    ~1000 retries spread over 3 seconds  
Time 0-7s:    ~1000 retries spread over 7 seconds

Jitter Implementation

From src/index.ts:78-80:
if (!strategy.jitter) return capped;

return Math.floor(Math.random() * capped);
When jitter is enabled, the delay is randomized between 0 and capped, creating a uniform distribution. Example with jitter:
const resilient = withResilience(task, {
  retries: 4,
  backoff: {
    type: 'exponential',
    baseDelayMs: 1000,
    maxDelayMs: 16000,
    jitter: true
  }
});

// Retry delays (example - actual values are random):
// Attempt 2: 534ms   (random between 0-1000ms)
// Attempt 3: 1847ms  (random between 0-2000ms)
// Attempt 4: 2103ms  (random between 0-4000ms)
// Attempt 5: 6891ms  (random between 0-8000ms)
Always enable jitter in production for exponential backoff. It significantly reduces load spikes during outages.

Backoff in the Retry Loop

Backoff is applied after a retry is decided but before the next attempt (from src/index.ts:157-162):
const shouldRetry = attempt <= retries && retryOn(err);
if (!shouldRetry) throw err;

const waitMs = computeBackoffMs(config.backoff, attempt);
hooks?.onRetry?.({ name, attempt, delayMs: waitMs, error: err });
if (waitMs > 0) await delay(waitMs);
The delay helper is a simple promise-based sleep (from src/index.ts:86-88):
function delay(ms: number) {
    return new Promise<void>((resolve) => setTimeout(resolve, ms));
}

Complete Examples

import { withResilience } from '@oldwhisper/resilience';

const fetchData = async () => {
  const response = await fetch('https://api.example.com/data');
  if (!response.ok) throw new Error(`HTTP ${response.status}`);
  return response.json();
};

const resilient = withResilience(fetchData, {
  name: 'fetchData',
  retries: 3,
  backoff: {
    type: 'fixed',
    delayMs: 2000  // Wait 2 seconds between retries
  },
  hooks: {
    onRetry: ({ attempt, delayMs }) => {
      console.log(`Retrying attempt ${attempt + 1} after ${delayMs}ms delay`);
    }
  }
});

// Total possible time: up to 6 seconds of backoff (3 retries × 2s each)
// Plus the time for each attempt itself
await resilient();

Choosing the Right Strategy

Use Fixed Backoff When:

  • You have 1-3 retries only
  • The service has predictable recovery time
  • You’re testing or debugging
  • Simplicity is more important than optimization

Use Exponential Backoff When:

  • You have 4+ retries
  • Recovery time is unknown or variable
  • You’re building production systems
  • You need to handle thundering herd scenarios
  • You want to progressively back off from a struggling service
For most production use cases, exponential backoff with jitter is the recommended approach.

Monitoring Backoff Behavior

Use hooks to track actual backoff delays:
const backoffMetrics = {
  totalDelayMs: 0,
  retryCount: 0
};

const resilient = withResilience(task, {
  retries: 5,
  backoff: {
    type: 'exponential',
    baseDelayMs: 100,
    maxDelayMs: 10000,
    jitter: true
  },
  hooks: {
    onRetry: ({ delayMs }) => {
      backoffMetrics.totalDelayMs += delayMs;
      backoffMetrics.retryCount++;
      console.log(`Cumulative backoff: ${backoffMetrics.totalDelayMs}ms over ${backoffMetrics.retryCount} retries`);
    }
  }
});

Best Practices

  1. Always use backoff with retries: Never retry without at least a small delay
  2. Enable jitter in production: Prevents synchronized retry storms
  3. Set reasonable maximums: maxDelayMs prevents excessive wait times
  4. Start small: Begin with short baseDelayMs (100-500ms)
  5. Consider total time: Account for (retries × average_delay) + (retries × timeout) in your SLAs
  • Retries - The retry mechanism that backoff enhances
  • Timeouts - Each retry attempt can have its own timeout
  • Circuit Breakers - Stop retrying when failures become systemic

Build docs developers (and LLMs) love