Skip to main content

Overview

Automatic retries increase reliability by re-attempting failed requests without manual intervention. The Gateway implements intelligent retry logic with exponential backoff to handle transient errors gracefully.

How It Works

When a request fails with a retryable status code, the Gateway automatically:
  1. Waits for a calculated backoff period
  2. Re-attempts the request
  3. Increases backoff time exponentially after each failure
  4. Returns the response once successful or after exhausting retry attempts
Retries use exponential backoff to prevent overwhelming providers during outages. The backoff strategy spaces out retry attempts intelligently.

Configuration

Basic Retry

Retry up to 3 times on default error codes:
{
  "retry": {
    "attempts": 3
  }
}
Default retryable status codes: [429, 500, 502, 503, 504]

Custom Status Codes

Retry on specific status codes only:
{
  "retry": {
    "attempts": 5,
    "on_status_codes": [429, 503]
  }
}

Provider Retry Headers

Respect provider-specified retry delays:
{
  "retry": {
    "attempts": 3,
    "use_retry_after_header": true
  }
}
When enabled, the Gateway respects retry-after, retry-after-ms, and x-ms-retry-after-ms headers from providers.
The maximum retry timeout is 60 seconds. If a provider requests a longer delay, the Gateway will skip retries and return the error.

Usage Examples

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="openai",
    Authorization="sk-***",
    config={
        "retry": {
            "attempts": 5,
            "on_status_codes": [429, 500, 502, 503, 504]
        }
    }
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather like?"}]
)

Implementation Details

Exponential Backoff

The Gateway implements exponential backoff using the async-retry library:
// From src/handlers/retryHandler.ts
await retry(
  async (bail, attempt) => {
    const response = await fetch(url, options);
    
    if (statusCodesToRetry.includes(response.status)) {
      throw new Error('Retry needed');
    }
    
    return response;
  },
  {
    retries: retryCount,
    randomize: false
  }
);
Backoff timing increases exponentially with each attempt, preventing server overload during outages.

Provider Retry-After Headers

When use_retry_after_header is enabled, the Gateway checks for:
  • retry-after (seconds)
  • retry-after-ms (milliseconds)
  • x-ms-retry-after-ms (milliseconds, Azure-specific)
If the specified delay exceeds 60 seconds or remaining retry budget, retries are skipped.

Retry Limits

  • Maximum attempts: 5
  • Maximum retry window: 60 seconds total
  • Timeout per request: Configurable via request_timeout
If a request times out (408 status), it counts toward the retry attempts. Configure both retries and timeouts appropriately.

Response Headers

Track retry behavior through response headers:
x-portkey-retry-attempt-count: 2
This header indicates the number of retry attempts made before the request succeeded (0 means first attempt succeeded).

Advanced Patterns

Retries with Fallbacks

Combine retries with fallback providers:
{
  "retry": {
    "attempts": 3,
    "on_status_codes": [429, 500, 502, 503, 504]
  },
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***"
    }
  ]
}
Behavior:
  1. Try OpenAI
  2. Retry up to 3 times with OpenAI
  3. If all retries fail, fallback to Anthropic
  4. Retry up to 3 times with Anthropic

Per-Target Retry Configuration

Different retry strategies for different providers:
{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "retry": {
        "attempts": 5,
        "on_status_codes": [429]
      }
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***",
      "retry": {
        "attempts": 2,
        "on_status_codes": [503]
      }
    }
  ]
}

Rate Limit Handling

Special configuration for rate limits:
{
  "retry": {
    "attempts": 5,
    "on_status_codes": [429],
    "use_retry_after_header": true
  }
}
This configuration:
  • Retries only on 429 (rate limit)
  • Respects provider’s retry-after header
  • Optimal for handling rate limits gracefully

Status Codes

Default Retryable Codes

CodeMeaningReason
429Too Many RequestsRate limit exceeded
500Internal Server ErrorTemporary server issue
502Bad GatewayUpstream server error
503Service UnavailableTemporary unavailability
504Gateway TimeoutRequest timeout upstream

Non-Retryable Codes

CodeMeaningWhy Not Retry
400Bad RequestInvalid request format
401UnauthorizedInvalid credentials
403ForbiddenNo permission
404Not FoundResource doesn’t exist
408Request TimeoutGateway timeout (configurable)
408 (Request Timeout) is thrown by the Gateway when request_timeout is exceeded. This is already in OpenAI format and won’t be retried by default.

Best Practices

Balance reliability with latency. More retries increase success rate but add latency. For user-facing applications, 2-3 attempts is typically sufficient.
Retry only on codes that indicate transient errors. Don’t retry on 400-level errors (except 429) as they indicate client errors that won’t resolve with retries.
When dealing with rate limits, enable use_retry_after_header to respect provider retry guidance and avoid unnecessary retries.
Always set request_timeout when using retries to prevent indefinite waiting on slow requests.
Track retry counts in your logs to identify reliability issues with providers. High retry rates may indicate capacity problems.

Fallbacks

Switch to backup providers when primary fails

Timeouts

Set maximum request duration

Load Balancing

Distribute load across providers

Configs

Complete config reference

Build docs developers (and LLMs) love