Retries

Overview

The OpenAI Python SDK automatically retries certain failed requests using exponential backoff. This improves reliability by handling transient errors without manual intervention. Default retry behavior:

Max retries: 2 attempts (3 total requests including the initial attempt)
Retry conditions: Connection errors, timeouts, rate limits, and server errors
Backoff strategy: Exponential backoff with jitter

Automatic Retries

The SDK automatically retries these error conditions:

Connection errors - Network connectivity problems
408 Request Timeout - Request took too long
409 Conflict - Resource conflict (lock timeout)
429 Rate Limit - Too many requests
5xx Server Errors - Internal server errors

from openai import OpenAI

# Default: 2 retries
client = OpenAI()

# This request will automatically retry up to 2 times on retryable errors
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

Configuring Retries

Client-Level Configuration

Set the default retry behavior for all requests:

from openai import OpenAI

# Disable retries
client = OpenAI(max_retries=0)

# Custom retry count
client = OpenAI(max_retries=5)

# Unlimited retries (not recommended)
import math
client = OpenAI(max_retries=math.inf)

Setting max_retries to None is not allowed. Use 0 to disable retries or a positive integer to set a limit.

Per-Request Configuration

Override retry settings for individual requests:

from openai import OpenAI

client = OpenAI(max_retries=2)  # Default

# Disable retries for this request
response = client.with_options(max_retries=0).chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

# More retries for critical requests
response = client.with_options(max_retries=5).chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Important message"}],
)

Backoff Strategy

The SDK uses exponential backoff with jitter to space out retry attempts:

Algorithm

Initial delay: 0.5 seconds
Exponential increase: Delay doubles with each retry
Maximum delay: 8 seconds
Jitter: ±25% random variation

Calculation

# Simplified retry delay calculation
initial_delay = 0.5  # seconds
max_delay = 8.0      # seconds
retries_taken = 2

# Base delay: 0.5 * 2^retries_taken
delay = min(initial_delay * (2 ** retries_taken), max_delay)

# Add jitter: ±25%
jitter = 1 - 0.25 * random()
actual_delay = delay * jitter

Example Timeline

Attempt 1 (initial):  0.0s - Request fails
                      0.5s - Wait (0.5s * jitter)
Attempt 2 (retry 1):  0.5s - Request fails
                      1.5s - Wait (1.0s * jitter)
Attempt 3 (retry 2):  2.0s - Request succeeds

Retry-After Header

The SDK respects the Retry-After header from rate limit responses:

from openai import OpenAI

client = OpenAI()

# If the API returns 429 with "Retry-After: 10",
# the SDK will wait 10 seconds before retrying
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

The SDK also supports the non-standard retry-after-ms header for millisecond precision. Priority:

retry-after-ms header (milliseconds)
Retry-After header (seconds or HTTP date)
Exponential backoff algorithm

If the Retry-After value is reasonable (≤60 seconds), it takes precedence over the exponential backoff calculation.

Server-Controlled Retries

The API can explicitly control retry behavior using the x-should-retry header:

x-should-retry: true - Always retry, even if not normally retryable
x-should-retry: false - Never retry, even if normally retryable

# The SDK automatically respects x-should-retry headers
# No additional configuration needed

Idempotency

The SDK automatically adds idempotency headers to retried requests (except GET requests):

from openai import OpenAI

client = OpenAI()

# Each retry attempt includes: X-Stainless-Idempotency-Key header
# This prevents duplicate operations on retries
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

The idempotency key is generated once per request and reused across all retry attempts.

Retry Headers

The SDK includes metadata headers with each request:

x-stainless-retry-count - Number of retries taken (0 for first attempt)
x-stainless-read-timeout - Read timeout in seconds

from openai import OpenAI

client = OpenAI()

# Headers automatically included:
# x-stainless-retry-count: 0 (first attempt)
# x-stainless-retry-count: 1 (first retry)
# x-stainless-retry-count: 2 (second retry)

Async Retries

Retries work identically with AsyncOpenAI, using async sleep:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(max_retries=3)

async def main():
    # Automatically retries up to 3 times
    response = await client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}],
    )

asyncio.run(main())

Custom Retry Logic

For custom retry logic, catch exceptions and implement your own strategy:

import time
import openai
from openai import OpenAI

client = OpenAI(max_retries=0)  # Disable automatic retries

def create_with_retry(max_attempts=3, backoff_factor=2):
    for attempt in range(max_attempts):
        try:
            return client.chat.completions.create(
                model="gpt-4",
                messages=[{"role": "user", "content": "Hello!"}],
            )
        except openai.RateLimitError as e:
            if attempt == max_attempts - 1:
                raise
            
            # Custom backoff
            wait_time = backoff_factor ** attempt
            print(f"Rate limited, waiting {wait_time}s")
            time.sleep(wait_time)
        except openai.APIConnectionError as e:
            if attempt == max_attempts - 1:
                raise
            
            print(f"Connection failed, retrying...")
            time.sleep(1)

response = create_with_retry()

Monitoring Retries

Enable logging to see retry attempts:

export OPENAI_LOG=debug

from openai import OpenAI

client = OpenAI()

# Logs will show:
# - "Sending HTTP Request: POST https://api.openai.com/v1/chat/completions"
# - "Encountered httpx.TimeoutException"
# - "2 retries left"
# - "Retrying request in 0.5 seconds"
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
)

Best Practices

Don’t disable retries unless you have a specific reason - they handle transient errors automatically
Use reasonable max_retries - Very high values can cause long delays
Let the SDK handle retries - The built-in strategy is well-tested
Monitor rate limits - Adjust your request rate if you frequently hit 429 errors
Consider exponential backoff for custom retry logic

Error Scenarios

Retried Automatically

import openai
from openai import OpenAI

client = OpenAI(max_retries=2)

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except openai.APITimeoutError:
    # Raised only after 2 retry attempts
    print("Request timed out after retries")
except openai.RateLimitError:
    # Raised only after 2 retry attempts
    print("Rate limit exceeded after retries")
except openai.InternalServerError:
    # Raised only after 2 retry attempts
    print("Server error after retries")

Not Retried

import openai
from openai import OpenAI

client = OpenAI(max_retries=2)

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except openai.AuthenticationError:
    # Never retried - invalid API key
    print("Authentication failed")
except openai.BadRequestError:
    # Never retried - malformed request
    print("Invalid request")
except openai.NotFoundError:
    # Never retried - resource doesn't exist
    print("Resource not found")

Error Handling - Exception types and handling
Timeouts - Configuring request timeouts
Client Initialization - Client configuration options

Get Started

Core Concepts

Guides

Overview

Automatic Retries

Configuring Retries

Client-Level Configuration

Per-Request Configuration

Backoff Strategy

Algorithm

Calculation

Example Timeline

Retry-After Header

Server-Controlled Retries

Idempotency

Retry Headers

Async Retries

Custom Retry Logic

Monitoring Retries

Best Practices

Error Scenarios

Retried Automatically

Not Retried

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Automatic Retries

​Configuring Retries

​Client-Level Configuration

​Per-Request Configuration

​Backoff Strategy

​Algorithm

​Calculation

​Example Timeline

​Retry-After Header

​Server-Controlled Retries

​Idempotency

​Retry Headers

​Async Retries

​Custom Retry Logic

​Monitoring Retries

​Best Practices

​Error Scenarios

​Retried Automatically

​Not Retried

​Related

Build docs developers (and LLMs) love

Overview

Automatic Retries

Configuring Retries

Client-Level Configuration

Per-Request Configuration

Backoff Strategy

Algorithm

Calculation

Example Timeline

Retry-After Header

Server-Controlled Retries

Idempotency

Retry Headers

Async Retries

Custom Retry Logic

Monitoring Retries

Best Practices

Error Scenarios

Retried Automatically

Not Retried

Related