Overview
Request timeouts prevent your application from waiting indefinitely for slow or unresponsive LLM providers. The Gateway allows you to set granular timeout limits that automatically terminate requests exceeding the specified duration.How It Works
When you set arequest_timeout, the Gateway:
- Starts a timer when the request is sent
- Monitors the request progress
- Aborts the request if it exceeds the timeout
- Returns a 408 (Request Timeout) error with details
- Network latency
- Provider processing time
- Response streaming
Timeouts are enforced at the Gateway level using AbortController, ensuring reliable timeout behavior across all providers.
Configuration
Basic Timeout
Set a timeout in milliseconds:Provider-Specific Timeouts
Different timeouts for different providers:Timeouts with Retries
Combine timeout with retry logic:If a request times out (408), it can be retried if 408 is included in
on_status_codes. This is useful for recovering from temporary network issues.Usage Examples
Implementation Details
Timeout Mechanism
The Gateway uses AbortController to enforce timeouts:Timeout Response Format
When a timeout occurs, the Gateway returns:408 Request Timeout
Timeout Strategies
Conservative Timeouts
For production applications with strict SLA requirements:- Sets 10-second timeout
- Retries twice on timeout
- Falls back to alternative provider
- Maximum wait: ~30 seconds (10s + 10s + 10s)
Generous Timeouts
For batch processing or long-form content:Model-Specific Timeouts
Different timeouts based on model characteristics:Streaming Timeouts
Timeouts apply to the entire streaming response, not individual chunks:- Timeout starts when connection is established
- Applies to the entire stream duration
- Streaming ends if total time exceeds timeout
For long-running streaming responses, set a generous timeout or omit it entirely to avoid premature termination.
Recommended Timeouts
By Use Case
| Use Case | Timeout | Reasoning |
|---|---|---|
| Chatbots | 10-15s | Users expect quick responses |
| Code Generation | 30-45s | Complex generation takes longer |
| Document Analysis | 60-90s | Processing large documents |
| Batch Processing | 120s+ | No user waiting |
| Streaming Chat | 30-60s | Account for full response time |
By Provider
| Provider | Recommended Timeout | Notes |
|---|---|---|
| OpenAI | 20-30s | Generally fast |
| Anthropic | 25-40s | Varies by model |
| Azure OpenAI | 30-45s | May have added latency |
| Bedrock | 40-60s | Regional variations |
| Ollama (local) | 60s+ | Depends on hardware |
Best Practices
Set Realistic Timeouts
Set Realistic Timeouts
Timeout should be longer than expected response time. Monitor P95/P99 latencies and set timeout accordingly. Typical recommendation: P99 latency × 1.5.
Always Use Timeouts in Production
Always Use Timeouts in Production
Even if providers are generally reliable, always set timeouts to prevent hanging requests from impacting your application.
Combine with Retries
Combine with Retries
Use retries with timeouts to recover from temporary slowdowns. Include 408 in
on_status_codes to retry timed-out requests.Monitor Timeout Rates
Monitor Timeout Rates
High timeout rates indicate provider performance issues. Track timeout frequency to identify patterns and adjust configurations.
Consider User Experience
Consider User Experience
For user-facing applications, set timeouts based on acceptable wait times. Users typically abandon after 5-10 seconds.
Account for Streaming
Account for Streaming
For streaming responses, timeout should cover the entire generation time, not just first chunk. Monitor end-to-end streaming duration.
Troubleshooting
Frequent Timeouts
If you’re experiencing frequent timeouts:- Check provider status: Verify the provider isn’t experiencing outages
- Increase timeout: Current setting may be too aggressive
- Monitor latency: Use Gateway logs to track actual response times
- Consider fallbacks: Add backup providers for resilience
- Optimize prompts: Reduce token count to speed up generation
Timeout Too Short
Signs your timeout is too short:- Frequent 408 errors
- Requests consistently timing out
- Users reporting incomplete responses
Timeout Too Long
Signs your timeout is too long:- Users waiting too long for errors
- Resources held unnecessarily
- Poor user experience
Related Features
Retries
Automatically retry timed-out requests
Fallbacks
Switch providers on timeout
Load Balancing
Distribute load to reduce timeouts
Streaming
Streaming-specific considerations