Overview
Automatic retries increase reliability by re-attempting failed requests without manual intervention. The Gateway implements intelligent retry logic with exponential backoff to handle transient errors gracefully.How It Works
When a request fails with a retryable status code, the Gateway automatically:- Waits for a calculated backoff period
- Re-attempts the request
- Increases backoff time exponentially after each failure
- Returns the response once successful or after exhausting retry attempts
Retries use exponential backoff to prevent overwhelming providers during outages. The backoff strategy spaces out retry attempts intelligently.
Configuration
Basic Retry
Retry up to 3 times on default error codes:[429, 500, 502, 503, 504]
Custom Status Codes
Retry on specific status codes only:Provider Retry Headers
Respect provider-specified retry delays:retry-after, retry-after-ms, and x-ms-retry-after-ms headers from providers.
The maximum retry timeout is 60 seconds. If a provider requests a longer delay, the Gateway will skip retries and return the error.
Usage Examples
Implementation Details
Exponential Backoff
The Gateway implements exponential backoff using theasync-retry library:
Provider Retry-After Headers
Whenuse_retry_after_header is enabled, the Gateway checks for:
retry-after(seconds)retry-after-ms(milliseconds)x-ms-retry-after-ms(milliseconds, Azure-specific)
Retry Limits
- Maximum attempts: 5
- Maximum retry window: 60 seconds total
- Timeout per request: Configurable via
request_timeout
Response Headers
Track retry behavior through response headers:Advanced Patterns
Retries with Fallbacks
Combine retries with fallback providers:- Try OpenAI
- Retry up to 3 times with OpenAI
- If all retries fail, fallback to Anthropic
- Retry up to 3 times with Anthropic
Per-Target Retry Configuration
Different retry strategies for different providers:Rate Limit Handling
Special configuration for rate limits:- Retries only on 429 (rate limit)
- Respects provider’s
retry-afterheader - Optimal for handling rate limits gracefully
Status Codes
Default Retryable Codes
| Code | Meaning | Reason |
|---|---|---|
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Temporary server issue |
| 502 | Bad Gateway | Upstream server error |
| 503 | Service Unavailable | Temporary unavailability |
| 504 | Gateway Timeout | Request timeout upstream |
Non-Retryable Codes
| Code | Meaning | Why Not Retry |
|---|---|---|
| 400 | Bad Request | Invalid request format |
| 401 | Unauthorized | Invalid credentials |
| 403 | Forbidden | No permission |
| 404 | Not Found | Resource doesn’t exist |
| 408 | Request Timeout | Gateway timeout (configurable) |
408 (Request Timeout) is thrown by the Gateway when
request_timeout is exceeded. This is already in OpenAI format and won’t be retried by default.Best Practices
Set Appropriate Retry Limits
Set Appropriate Retry Limits
Balance reliability with latency. More retries increase success rate but add latency. For user-facing applications, 2-3 attempts is typically sufficient.
Use Status Code Filtering
Use Status Code Filtering
Retry only on codes that indicate transient errors. Don’t retry on 400-level errors (except 429) as they indicate client errors that won’t resolve with retries.
Enable Retry-After Headers
Enable Retry-After Headers
When dealing with rate limits, enable
use_retry_after_header to respect provider retry guidance and avoid unnecessary retries.Combine with Timeouts
Combine with Timeouts
Always set
request_timeout when using retries to prevent indefinite waiting on slow requests.Monitor Retry Metrics
Monitor Retry Metrics
Track retry counts in your logs to identify reliability issues with providers. High retry rates may indicate capacity problems.
Related Features
Fallbacks
Switch to backup providers when primary fails
Timeouts
Set maximum request duration
Load Balancing
Distribute load across providers
Configs
Complete config reference