Skip to main content

Overview

Helicone AI Gateway provides automatic fallback capabilities to ensure your AI applications remain reliable even when individual providers fail. When a request fails, the gateway automatically tries alternative providers in the order you specify.

How Fallbacks Work

The gateway processes fallbacks in a predictable order:
1

Primary Attempt

The gateway tries the first model/provider in your list.
2

Failure Detection

If the request fails (rate limit, timeout, service error, etc.), the gateway moves to the next option.
3

Automatic Retry

The gateway automatically retries with the next model/provider in your list.
4

Success or Exhaustion

The process continues until a request succeeds or all options are exhausted.
Fallbacks work across both BYOK (Bring Your Own Key) and PTB (Pass-Through Billing) authentication methods.

Basic Fallback Configuration

Same Model, Different Providers

Route the same model through different providers:
const response = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra",
  messages: [{ role: "user", content: "Hello!" }],
});
Fallback chain:
  1. OpenAI (primary)
  2. Azure OpenAI (if OpenAI fails)
  3. DeepInfra (if Azure fails)

Different Models

Fallback to different models:
const response = await client.chat.completions.create({
  model: "gpt-4o,gpt-4o-mini,claude-sonnet-4",
  messages: [{ role: "user", content: "Hello!" }],
});
Fallback chain:
  1. GPT-4o (best available provider)
  2. GPT-4o-mini (cheaper alternative)
  3. Claude Sonnet 4 (different model family)

Cross-Provider Fallback

Fallback across different cloud providers:
const response = await client.chat.completions.create({
  model: "claude-3-7-sonnet-20250219/bedrock,claude-3-7-sonnet-20250219/anthropic,claude-3-7-sonnet-20250219/vertex",
  messages: [{ role: "user", content: "Hello!" }],
});
Fallback chain:
  1. AWS Bedrock (primary)
  2. Anthropic direct (if Bedrock fails)
  3. Google Vertex AI (if Anthropic fails)

Common Fallback Patterns

Try cheaper providers first, fallback to premium:
const response = await client.chat.completions.create({
  model: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai,gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});
Pattern:
  1. DeepInfra (lowest cost)
  2. OpenAI standard (if DeepInfra unavailable)
  3. GPT-4o (best quality, highest cost)

Failure Scenarios

The gateway automatically retries on these failure types:
Provider rate limit exceeded
model: "gpt-4o/openai,gpt-4o/azure"
// If OpenAI returns 429, immediately tries Azure
Exception: Helicone-generated 429s (escrow failure, rate limits) bail immediately without trying fallbacks.
Invalid or expired provider keys
model: "gpt-4o/openai,gpt-4o/azure"
// If OpenAI BYOK key is invalid, tries Azure
Provider service unavailable
model: "claude-sonnet-4/anthropic,claude-sonnet-4/bedrock"
// If Anthropic is down, tries Bedrock
Request timeout
model: "llama-3.3-70b/groq,llama-3.3-70b/together"
// If Groq times out, tries Together AI
Model not available for PTB
model: "special-model/bedrock,gpt-4o"
// If model is disallowed for PTB, tries next option

Advanced Fallback Strategies

Multi-Model Fallback

Combine multiple models and providers:
const response = await client.chat.completions.create({
  model: [
    "gpt-4o/openai",           // Primary: GPT-4o on OpenAI
    "gpt-4o/azure",            // Fallback 1: GPT-4o on Azure
    "claude-sonnet-4/anthropic", // Fallback 2: Claude on Anthropic
    "gemini-2.0-flash/google-ai-studio", // Fallback 3: Gemini
  ].join(","),
  messages: [{ role: "user", content: "Hello!" }],
});

Conditional Fallbacks

Choose fallback strategy based on context:
function getFallbackModel(priority: "cost" | "speed" | "reliability") {
  const strategies = {
    cost: "gpt-4o-mini/deepinfra,gpt-4o-mini,claude-3-haiku",
    speed: "llama-3.3-70b/groq,gpt-4o-mini/openai,gemini-2.0-flash",
    reliability: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic,gemini-2.0-flash/vertex",
  };
  return strategies[priority];
}

const response = await client.chat.completions.create({
  model: getFallbackModel("reliability"),
  messages: [{ role: "user", content: "Hello!" }],
});

Provider Exclusions with Fallbacks

Exclude specific providers while maintaining fallbacks:
const response = await client.chat.completions.create({
  model: "!deepinfra,gpt-4o/openai,gpt-4o/azure",
  messages: [{ role: "user", content: "Hello!" }],
});
// Tries OpenAI and Azure, but never DeepInfra

Monitoring Fallbacks

Track fallback behavior in the Helicone dashboard:
1

View Request Details

Open any request in Requests
2

Check Attempts

See all provider attempts and which one succeeded
3

Analyze Patterns

  • Which providers fail most often?
  • How many fallback attempts typically occur?
  • What’s the success rate by provider?

Error Responses

When all fallback attempts fail, the gateway returns a consolidated error:
{
  "error": {
    "message": "All fallback attempts failed",
    "type": "all_attempts_failed",
    "attempts": [
      {
        "source": "gpt-4o/openai",
        "error": "Rate limit exceeded",
        "status": 429
      },
      {
        "source": "gpt-4o/azure",
        "error": "Service unavailable",
        "status": 503
      }
    ]
  }
}
Helicone 429s bail immediately:If Helicone returns a 429 (insufficient credits or rate limit), the request fails immediately without trying fallbacks. Add credits at helicone.ai/credits.

Fallback Best Practices

List the native provider first for best compatibility:
// Good: Native provider first
model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra"

// Avoid: Non-native provider first
model: "gpt-4o/deepinfra,gpt-4o/openai"
Mix low-cost and reliable providers:
model: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai,gpt-4o"
// Try cheap first, fallback to reliable
2-4 fallbacks is usually sufficient:
// Good: 3 fallbacks
model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4"

// Too many: 7+ fallbacks may cause latency
model: "model1,model2,model3,model4,model5,model6,model7"
If fallbacks trigger frequently:
  • Check provider status
  • Verify BYOK keys
  • Consider changing primary provider
  • Review rate limits
Monitor at Helicone Dashboard
Test fallbacks by:
  1. Using invalid BYOK keys temporarily
  2. Requesting rate-limited models
  3. Monitoring which providers succeed

Real-World Examples

Production-Grade Fallback

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ai-gateway.helicone.ai",
  apiKey: process.env.HELICONE_API_KEY,
});

async function robustCompletion(prompt: string) {
  try {
    const response = await client.chat.completions.create({
      model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic,gemini-2.0-flash/vertex",
      messages: [{ role: "user", content: prompt }],
      max_tokens: 1000,
    });
    return response.choices[0].message.content;
  } catch (error) {
    // All fallbacks failed
    console.error("All providers failed:", error);
    throw error;
  }
}

Cost-Optimized with Fallback

async function costOptimizedCompletion(prompt: string, budget: "low" | "high") {
  const models = {
    low: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai",
    high: "gpt-4o/openai,claude-sonnet-4/anthropic",
  };

  const response = await client.chat.completions.create({
    model: models[budget],
    messages: [{ role: "user", content: prompt }],
  });
  
  return response.choices[0].message.content;
}

Regional Resilience

async function regionalCompletion(prompt: string, preferredRegion: "us" | "eu") {
  const models = {
    us: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic",
    eu: "eu.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic",
  };

  const response = await client.chat.completions.create({
    model: models[preferredRegion],
    messages: [{ role: "user", content: prompt }],
  });
  
  return response.choices[0].message.content;
}

Streaming with Fallbacks

Fallbacks work seamlessly with streaming:
const stream = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
If the first provider fails, the stream automatically switches to the next provider.

Next Steps

Routing

Learn more about provider routing

Getting Started

Set up the AI Gateway

Monitor Requests

Track fallback behavior

Browse Models

Explore available models

Build docs developers (and LLMs) love