Automatic Fallbacks

The AI Gateway automatically handles provider failures by instantly switching to backup providers. Your application stays online even when providers go down, hit rate limits, or experience errors.

Why Fallbacks Matter

Provider Downtime

OpenAI outages took down ChatGPT, Cursor, and thousands of apps in 2024

Rate Limiting

Hit your quota during peak hours and block all users

Regional Issues

Provider performance varies by geography and time of day

Cost Optimization

Route to cheaper providers first, with expensive backups

How Fallbacks Work

The gateway automatically retries failed requests with alternative providers:

// Specify fallback chain with comma-separated models
const response = await client.chat.completions.create({
  model: "gpt-4o-mini/openai,gpt-4o-mini/azure,gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }]
});

What happens:

Tries OpenAI first
If OpenAI fails → Instantly tries Azure
If Azure fails → Tries all other available providers
Returns first successful response

Total added latency: ~50ms per failover (near-instant)

Automatic Failover Triggers

The gateway automatically fails over when it encounters:

Error Code	Trigger	Example
429	Rate limit exceeded	Provider quota reached
401	Authentication failed	Invalid API key
408	Request timeout	Provider taking too long
500-599	Server errors	Provider infrastructure issues
400	Context length exceeded	Prompt too long for provider

The gateway intelligently handles errors. Some 400 errors (like invalid model format) won’t trigger fallback since they’d fail on all providers.

Fallback Strategies

Default Automatic Routing

No configuration needed - the gateway automatically finds all providers:

model: "claude-sonnet-4"
// Automatically tries: Anthropic → Bedrock → Vertex → All others

Best for: Most production use cases. Handles failures automatically.

Explicit Provider Chain

Specify exact providers and order:

model: "claude-sonnet-4/anthropic,claude-sonnet-4/bedrock"
// Tries: Anthropic → Bedrock → Stops

Best for: When you want control over which providers are used.

Open-Ended Chain

Start with specific providers, then try all others:

model: "gpt-4o-mini/azure,gpt-4o-mini"
// Tries: Azure → All other GPT-4o-mini providers

Best for: Prioritizing specific providers (e.g., using your credits) while keeping full redundancy.

BYOK with Credit Fallback

Your provider keys automatically fall back to Helicone’s managed keys:

// Add your OpenAI key in Provider Settings
model: "gpt-4o-mini"
// Tries: Your OpenAI key → Helicone's keys → Other providers

Best for: Using your provider credits with automatic backup.

Real-World Examples

Production Reliability

Maximize uptime with multi-provider redundancy:

const client = new OpenAI({
  baseURL: "https://ai-gateway.helicone.ai",
  apiKey: process.env.HELICONE_API_KEY,
});

const response = await client.chat.completions.create({
  // Try Anthropic, then Bedrock, then all others
  model: "claude-sonnet-4/anthropic,claude-sonnet-4/bedrock,claude-sonnet-4",
  messages: [
    { role: "user", content: "Analyze this customer feedback" }
  ],
});

Result: Your app stays online even if Anthropic and Bedrock both go down.

Cost Optimization

Use cheaper providers first, with premium backups:

const response = await client.chat.completions.create({
  // Start with cheapest, escalate if needed
  model: "gpt-4o-mini/azure,gpt-4o-mini/openai",
  messages: [{ role: "user", content: query }],
});

Result: Saves money by using Azure (your credits), falls back to OpenAI if Azure is down.

Regional Compliance

Prioritize EU providers for GDPR compliance:

const response = await client.chat.completions.create({
  // EU region first, then US regions as backup
  model: "gpt-4o/azure/eu-deployment,gpt-4o/openai",
  messages: [{ role: "user", content: euUserQuery }],
});

Result: Uses EU deployment for EU users, fails over to US if EU region has issues.

Mixed Model Fallbacks

Fall back to different models if preferred model fails:

const response = await client.chat.completions.create({
  // Try GPT-4, fall back to Claude if needed
  model: "gpt-4o,claude-sonnet-4",
  messages: [{ role: "user", content: "Complex reasoning task" }],
});

Result: Tries GPT-4 first, uses Claude if GPT-4 is unavailable.

Avoid Problematic Providers

Exclude providers experiencing issues:

const response = await client.chat.completions.create({
  // Use any provider EXCEPT the ones having issues
  model: "!openai,!azure,gpt-4o-mini",
  messages: [{ role: "user", content: query }],
});

Result: Routes around providers with known issues while maintaining full redundancy.

Error Handling

Successful Fallback

When a fallback succeeds, the response includes metadata:

const response = await client.chat.completions.create({
  model: "gpt-4o-mini/openai,gpt-4o-mini/azure",
  messages: [{ role: "user", content: "Hello!" }],
});

// Check which provider was used in Helicone dashboard
// Response header: helicone-fallback-index: 1 (if Azure was used)

View fallback details in your Helicone dashboard:

Which provider was tried first
Why it failed
Which provider ultimately succeeded
Total latency including fallback time

All Providers Failed

If all providers in your chain fail, you get a detailed error:

{
  "error": {
    "code": "all_attempts_failed",
    "message": "All attempts failed",
    "details": [
      {
        "source": "gpt-4o-mini/openai",
        "statusCode": 429,
        "message": "Rate limit exceeded",
        "type": "rate_limited"
      },
      {
        "source": "gpt-4o-mini/azure",
        "statusCode": 500,
        "message": "Internal server error",
        "type": "request_failed"
      }
    ]
  }
}

If all providers fail, the gateway returns the most actionable error (e.g., authentication > rate limit > server error).

Fallback Best Practices

1. Start with Automatic Routing

Let the gateway handle everything:

// ✅ Simple and reliable
model: "claude-sonnet-4"

Only add explicit fallbacks if you need specific control.

2. Use Open-Ended Chains

Always include a final catch-all:

// ✅ Good - Falls back to all providers
model: "gpt-4o-mini/azure,gpt-4o-mini"

// ❌ Risky - Only tries two providers
model: "gpt-4o-mini/azure,gpt-4o-mini/openai"

3. Monitor Fallback Rates

Check your Helicone dashboard regularly:

High fallback rates indicate provider issues
Adjust your provider priorities based on reliability
Consider excluding consistently failing providers

4. Test Your Fallback Chain

Verify your fallback logic works:

// Test by forcing a provider that will fail
model: "invalid-provider,gpt-4o-mini"
// Should successfully fall back to gpt-4o-mini

5. Combine with Rate Limiting

Use Helicone’s custom rate limits to prevent provider quota exhaustion:

const client = new OpenAI({
  baseURL: "https://ai-gateway.helicone.ai",
  apiKey: process.env.HELICONE_API_KEY,
  defaultHeaders: {
    "Helicone-RateLimit-Policy": "100;w=60;segment=user",
  },
});

Learn more about rate limiting →

Advanced: Model Provider Priority

The gateway uses intelligent provider selection:

Priority Factors

Your provider keys - Always tried first
Provider cost - Cheaper providers prioritized
Provider reliability - Historical uptime considered
Load balancing - Equal providers rotated

Example Priority Order

For model: "gpt-4o-mini":

Your OpenAI key (BYOK)
Your Azure key (BYOK)
Helicone’s OpenAI key (cheapest PTB)
Helicone’s Azure key (PTB)
Helicone’s Bedrock (PTB)
Other providers

This ensures optimal cost and reliability.

Fallback vs Rate Limiting

Understand when to use each:

Scenario	Use Fallbacks	Use Rate Limits
Provider goes down	✅	❌
Prevent quota exhaustion	❌	✅
Cost optimization	✅	❌
Control user usage	❌	✅
Regional compliance	✅	❌
Protect provider budget	❌	✅

Best practice: Use both together for maximum reliability and control.

Next Steps

Provider Routing

Learn how the gateway routes requests to providers

Custom Rate Limits

Prevent quota exhaustion with custom limits

Error Handling

Advanced error handling and retry strategies

Browse Models

See which providers support your models

Getting Started

AI Gateway

Observability

Prompt Management

Features

Integrations

Self-Hosting

​Why Fallbacks Matter

Provider Downtime

Rate Limiting

Regional Issues

Cost Optimization

​How Fallbacks Work

​Automatic Failover Triggers

​Fallback Strategies

​Default Automatic Routing

​Explicit Provider Chain

​Open-Ended Chain

​BYOK with Credit Fallback

​Real-World Examples

​Production Reliability

​Cost Optimization

​Regional Compliance

​Mixed Model Fallbacks

​Avoid Problematic Providers

​Error Handling

​Successful Fallback

​All Providers Failed

​Fallback Best Practices

​1. Start with Automatic Routing

​2. Use Open-Ended Chains

​3. Monitor Fallback Rates

​4. Test Your Fallback Chain

​5. Combine with Rate Limiting

​Advanced: Model Provider Priority

​Priority Factors

​Example Priority Order

​Fallback vs Rate Limiting

​Next Steps

Provider Routing

Custom Rate Limits

Error Handling

Browse Models

Build docs developers (and LLMs) love

Why Fallbacks Matter

How Fallbacks Work

Automatic Failover Triggers

Fallback Strategies

Default Automatic Routing

Explicit Provider Chain

Open-Ended Chain

BYOK with Credit Fallback

Real-World Examples

Production Reliability

Cost Optimization

Regional Compliance

Mixed Model Fallbacks

Avoid Problematic Providers

Error Handling

Successful Fallback

All Providers Failed

Fallback Best Practices

1. Start with Automatic Routing

2. Use Open-Ended Chains

3. Monitor Fallback Rates

4. Test Your Fallback Chain

5. Combine with Rate Limiting

Advanced: Model Provider Priority

Priority Factors

Example Priority Order

Fallback vs Rate Limiting

Next Steps