Automatic Retries

Overview

Automatic retries increase reliability by re-attempting failed requests without manual intervention. The Gateway implements intelligent retry logic with exponential backoff to handle transient errors gracefully.

How It Works

When a request fails with a retryable status code, the Gateway automatically:

Waits for a calculated backoff period
Re-attempts the request
Increases backoff time exponentially after each failure
Returns the response once successful or after exhausting retry attempts

Retries use exponential backoff to prevent overwhelming providers during outages. The backoff strategy spaces out retry attempts intelligently.

Configuration

Basic Retry

Retry up to 3 times on default error codes:

{
  "retry": {
    "attempts": 3
  }
}

Default retryable status codes: [429, 500, 502, 503, 504]

Custom Status Codes

Retry on specific status codes only:

{
  "retry": {
    "attempts": 5,
    "on_status_codes": [429, 503]
  }
}

Provider Retry Headers

Respect provider-specified retry delays:

{
  "retry": {
    "attempts": 3,
    "use_retry_after_header": true
  }
}

When enabled, the Gateway respects retry-after, retry-after-ms, and x-ms-retry-after-ms headers from providers.

The maximum retry timeout is 60 seconds. If a provider requests a longer delay, the Gateway will skip retries and return the error.

Usage Examples

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="openai",
    Authorization="sk-***",
    config={
        "retry": {
            "attempts": 5,
            "on_status_codes": [429, 500, 502, 503, 504]
        }
    }
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather like?"}]
)

Implementation Details

Exponential Backoff

The Gateway implements exponential backoff using the async-retry library:

// From src/handlers/retryHandler.ts
await retry(
  async (bail, attempt) => {
    const response = await fetch(url, options);
    
    if (statusCodesToRetry.includes(response.status)) {
      throw new Error('Retry needed');
    }
    
    return response;
  },
  {
    retries: retryCount,
    randomize: false
  }
);

Backoff timing increases exponentially with each attempt, preventing server overload during outages.

Provider Retry-After Headers

When use_retry_after_header is enabled, the Gateway checks for:

retry-after (seconds)
retry-after-ms (milliseconds)
x-ms-retry-after-ms (milliseconds, Azure-specific)

If the specified delay exceeds 60 seconds or remaining retry budget, retries are skipped.

Retry Limits

Maximum attempts: 5
Maximum retry window: 60 seconds total
Timeout per request: Configurable via request_timeout

If a request times out (408 status), it counts toward the retry attempts. Configure both retries and timeouts appropriately.

Response Headers

Track retry behavior through response headers:

x-portkey-retry-attempt-count: 2

This header indicates the number of retry attempts made before the request succeeded (0 means first attempt succeeded).

Advanced Patterns

Retries with Fallbacks

Combine retries with fallback providers:

{
  "retry": {
    "attempts": 3,
    "on_status_codes": [429, 500, 502, 503, 504]
  },
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***"
    }
  ]
}

Behavior:

Try OpenAI
Retry up to 3 times with OpenAI
If all retries fail, fallback to Anthropic
Retry up to 3 times with Anthropic

Per-Target Retry Configuration

Different retry strategies for different providers:

{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "retry": {
        "attempts": 5,
        "on_status_codes": [429]
      }
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***",
      "retry": {
        "attempts": 2,
        "on_status_codes": [503]
      }
    }
  ]
}

Rate Limit Handling

Special configuration for rate limits:

{
  "retry": {
    "attempts": 5,
    "on_status_codes": [429],
    "use_retry_after_header": true
  }
}

This configuration:

Retries only on 429 (rate limit)
Respects provider’s retry-after header
Optimal for handling rate limits gracefully

Status Codes

Default Retryable Codes

Code	Meaning	Reason
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Temporary server issue
502	Bad Gateway	Upstream server error
503	Service Unavailable	Temporary unavailability
504	Gateway Timeout	Request timeout upstream

Non-Retryable Codes

Code	Meaning	Why Not Retry
400	Bad Request	Invalid request format
401	Unauthorized	Invalid credentials
403	Forbidden	No permission
404	Not Found	Resource doesn’t exist
408	Request Timeout	Gateway timeout (configurable)

408 (Request Timeout) is thrown by the Gateway when request_timeout is exceeded. This is already in OpenAI format and won’t be retried by default.

Best Practices

Set Appropriate Retry Limits

Balance reliability with latency. More retries increase success rate but add latency. For user-facing applications, 2-3 attempts is typically sufficient.

Use Status Code Filtering

Retry only on codes that indicate transient errors. Don’t retry on 400-level errors (except 429) as they indicate client errors that won’t resolve with retries.

Enable Retry-After Headers

When dealing with rate limits, enable use_retry_after_header to respect provider retry guidance and avoid unnecessary retries.

Combine with Timeouts

Always set request_timeout when using retries to prevent indefinite waiting on slow requests.

Monitor Retry Metrics

Track retry counts in your logs to identify reliability issues with providers. High retry rates may indicate capacity problems.

Fallbacks

Switch to backup providers when primary fails

Timeouts

Set maximum request duration

Load Balancing

Distribute load across providers

Configs

Complete config reference

Getting Started

Core Concepts

Features

MCP Gateway

Deployment

Overview

How It Works

Configuration

Basic Retry

Custom Status Codes

Provider Retry Headers

Usage Examples

Implementation Details

Exponential Backoff

Provider Retry-After Headers

Retry Limits

Response Headers

Advanced Patterns

Retries with Fallbacks

Per-Target Retry Configuration

Rate Limit Handling

Status Codes

Default Retryable Codes

Non-Retryable Codes

Best Practices

Fallbacks

Timeouts

Load Balancing

Configs

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Features

MCP Gateway

Deployment

​Overview

​How It Works

​Configuration

​Basic Retry

​Custom Status Codes

​Provider Retry Headers

​Usage Examples

​Implementation Details

​Exponential Backoff

​Provider Retry-After Headers

​Retry Limits

​Response Headers

​Advanced Patterns

​Retries with Fallbacks

​Per-Target Retry Configuration

​Rate Limit Handling

​Status Codes

​Default Retryable Codes

​Non-Retryable Codes

​Best Practices

​Related Features

Fallbacks

Timeouts

Load Balancing

Configs

Build docs developers (and LLMs) love

Overview

How It Works

Configuration

Basic Retry

Custom Status Codes

Provider Retry Headers

Usage Examples

Implementation Details

Exponential Backoff

Provider Retry-After Headers

Retry Limits

Response Headers

Advanced Patterns

Retries with Fallbacks

Per-Target Retry Configuration

Rate Limit Handling

Status Codes

Default Retryable Codes

Non-Retryable Codes

Best Practices

Related Features