Skip to main content

Overview

Request timeouts prevent your application from waiting indefinitely for slow or unresponsive LLM providers. The Gateway allows you to set granular timeout limits that automatically terminate requests exceeding the specified duration.

How It Works

When you set a request_timeout, the Gateway:
  1. Starts a timer when the request is sent
  2. Monitors the request progress
  3. Aborts the request if it exceeds the timeout
  4. Returns a 408 (Request Timeout) error with details
The timeout applies to the entire request lifecycle, including:
  • Network latency
  • Provider processing time
  • Response streaming
Timeouts are enforced at the Gateway level using AbortController, ensuring reliable timeout behavior across all providers.

Configuration

Basic Timeout

Set a timeout in milliseconds:
{
  "request_timeout": 30000
}
This configuration aborts any request taking longer than 30 seconds.

Provider-Specific Timeouts

Different timeouts for different providers:
{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "request_timeout": 20000
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***",
      "request_timeout": 45000
    }
  ]
}

Timeouts with Retries

Combine timeout with retry logic:
{
  "request_timeout": 15000,
  "retry": {
    "attempts": 3,
    "on_status_codes": [408, 429, 500, 502, 503, 504]
  }
}
If a request times out (408), it can be retried if 408 is included in on_status_codes. This is useful for recovering from temporary network issues.

Usage Examples

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="openai",
    Authorization="sk-***",
    config={
        "request_timeout": 30000  # 30 seconds
    }
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain quantum computing"}]
    )
except Exception as e:
    if "timeout" in str(e).lower():
        print("Request timed out after 30 seconds")

Implementation Details

Timeout Mechanism

The Gateway uses AbortController to enforce timeouts:
// From src/handlers/retryHandler.ts
async function fetchWithTimeout(
  url: string,
  options: RequestInit,
  timeout: number
) {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), timeout);
  
  const timeoutRequestOptions = {
    ...options,
    signal: controller.signal,
  };

  try {
    const response = await fetch(url, timeoutRequestOptions);
    clearTimeout(timeoutId);
    return response;
  } catch (err) {
    if (err.name === 'AbortError') {
      return new Response(
        JSON.stringify({
          error: {
            message: `Request exceeded the timeout: ${timeout}ms`,
            type: 'timeout_error',
            code: null
          }
        }),
        { status: 408 }
      );
    }
    throw err;
  }
}

Timeout Response Format

When a timeout occurs, the Gateway returns:
{
  "error": {
    "message": "Request exceeded the timeout sent in the request: 30000ms",
    "type": "timeout_error",
    "param": null,
    "code": null
  }
}
HTTP Status: 408 Request Timeout
The timeout response is already in OpenAI-compatible format. It won’t be transformed again by response transformers.

Timeout Strategies

Conservative Timeouts

For production applications with strict SLA requirements:
{
  "request_timeout": 10000,
  "retry": {
    "attempts": 2,
    "on_status_codes": [408, 500, 502, 503, 504]
  },
  "strategy": { "mode": "fallback" },
  "targets": [
    {"provider": "openai", "api_key": "sk-***"},
    {"provider": "anthropic", "api_key": "sk-ant-***"}
  ]
}
This configuration:
  • Sets 10-second timeout
  • Retries twice on timeout
  • Falls back to alternative provider
  • Maximum wait: ~30 seconds (10s + 10s + 10s)

Generous Timeouts

For batch processing or long-form content:
{
  "request_timeout": 120000,
  "retry": {
    "attempts": 1
  }
}

Model-Specific Timeouts

Different timeouts based on model characteristics:
{
  "strategy": { "mode": "conditional" },
  "conditions": [
    {
      "query": { "model": "gpt-4o" },
      "then": "fast-model"
    },
    {
      "query": { "model": "o1-preview" },
      "then": "slow-model"
    }
  ],
  "targets": [
    {
      "name": "fast-model",
      "provider": "openai",
      "api_key": "sk-***",
      "request_timeout": 15000
    },
    {
      "name": "slow-model",
      "provider": "openai",
      "api_key": "sk-***",
      "request_timeout": 60000
    }
  ]
}

Streaming Timeouts

Timeouts apply to the entire streaming response, not individual chunks:
{
  "request_timeout": 60000
}
For streaming requests:
  • Timeout starts when connection is established
  • Applies to the entire stream duration
  • Streaming ends if total time exceeds timeout
For long-running streaming responses, set a generous timeout or omit it entirely to avoid premature termination.

By Use Case

Use CaseTimeoutReasoning
Chatbots10-15sUsers expect quick responses
Code Generation30-45sComplex generation takes longer
Document Analysis60-90sProcessing large documents
Batch Processing120s+No user waiting
Streaming Chat30-60sAccount for full response time

By Provider

ProviderRecommended TimeoutNotes
OpenAI20-30sGenerally fast
Anthropic25-40sVaries by model
Azure OpenAI30-45sMay have added latency
Bedrock40-60sRegional variations
Ollama (local)60s+Depends on hardware

Best Practices

Timeout should be longer than expected response time. Monitor P95/P99 latencies and set timeout accordingly. Typical recommendation: P99 latency × 1.5.
Even if providers are generally reliable, always set timeouts to prevent hanging requests from impacting your application.
Use retries with timeouts to recover from temporary slowdowns. Include 408 in on_status_codes to retry timed-out requests.
High timeout rates indicate provider performance issues. Track timeout frequency to identify patterns and adjust configurations.
For user-facing applications, set timeouts based on acceptable wait times. Users typically abandon after 5-10 seconds.
For streaming responses, timeout should cover the entire generation time, not just first chunk. Monitor end-to-end streaming duration.

Troubleshooting

Frequent Timeouts

If you’re experiencing frequent timeouts:
  1. Check provider status: Verify the provider isn’t experiencing outages
  2. Increase timeout: Current setting may be too aggressive
  3. Monitor latency: Use Gateway logs to track actual response times
  4. Consider fallbacks: Add backup providers for resilience
  5. Optimize prompts: Reduce token count to speed up generation

Timeout Too Short

Signs your timeout is too short:
  • Frequent 408 errors
  • Requests consistently timing out
  • Users reporting incomplete responses
Solution: Gradually increase timeout while monitoring success rate.

Timeout Too Long

Signs your timeout is too long:
  • Users waiting too long for errors
  • Resources held unnecessarily
  • Poor user experience
Solution: Reduce timeout based on P95 latency data.

Retries

Automatically retry timed-out requests

Fallbacks

Switch providers on timeout

Load Balancing

Distribute load to reduce timeouts

Streaming

Streaming-specific considerations

Build docs developers (and LLMs) love