Automatic Fallbacks

Overview

Helicone AI Gateway provides automatic fallback capabilities to ensure your AI applications remain reliable even when individual providers fail. When a request fails, the gateway automatically tries alternative providers in the order you specify.

How Fallbacks Work

The gateway processes fallbacks in a predictable order:

Primary Attempt

The gateway tries the first model/provider in your list.

Failure Detection

If the request fails (rate limit, timeout, service error, etc.), the gateway moves to the next option.

Automatic Retry

The gateway automatically retries with the next model/provider in your list.

Success or Exhaustion

The process continues until a request succeeds or all options are exhausted.

Fallbacks work across both BYOK (Bring Your Own Key) and PTB (Pass-Through Billing) authentication methods.

Basic Fallback Configuration

Same Model, Different Providers

Route the same model through different providers:

const response = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra",
  messages: [{ role: "user", content: "Hello!" }],
});

Fallback chain:

OpenAI (primary)
Azure OpenAI (if OpenAI fails)
DeepInfra (if Azure fails)

Different Models

Fallback to different models:

const response = await client.chat.completions.create({
  model: "gpt-4o,gpt-4o-mini,claude-sonnet-4",
  messages: [{ role: "user", content: "Hello!" }],
});

Fallback chain:

GPT-4o (best available provider)
GPT-4o-mini (cheaper alternative)
Claude Sonnet 4 (different model family)

Cross-Provider Fallback

Fallback across different cloud providers:

const response = await client.chat.completions.create({
  model: "claude-3-7-sonnet-20250219/bedrock,claude-3-7-sonnet-20250219/anthropic,claude-3-7-sonnet-20250219/vertex",
  messages: [{ role: "user", content: "Hello!" }],
});

Fallback chain:

AWS Bedrock (primary)
Anthropic direct (if Bedrock fails)
Google Vertex AI (if Anthropic fails)

Common Fallback Patterns

Cost Optimization
Regional Resilience
Speed Optimization
BYOK + PTB

Try cheaper providers first, fallback to premium:

const response = await client.chat.completions.create({
  model: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai,gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

Pattern:

DeepInfra (lowest cost)
OpenAI standard (if DeepInfra unavailable)
GPT-4o (best quality, highest cost)

Fallback across regions for high availability:

const response = await client.chat.completions.create({
  model: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,eu.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic",
  messages: [{ role: "user", content: "Hello!" }],
});

Pattern:

US region Bedrock
EU region Bedrock
Anthropic global

Try fastest providers first:

const response = await client.chat.completions.create({
  model: "llama-3.3-70b/groq,llama-3.3-70b/deepinfra,llama-3.3-70b/together",
  messages: [{ role: "user", content: "Hello!" }],
});

Pattern:

Groq (fastest inference)
DeepInfra (fast, cheaper)
Together AI (reliable)

Fallback from BYOK to PTB automatically:

// With OpenAI BYOK key configured
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

Automatic pattern:

OpenAI BYOK (your key)
OpenAI PTB (Helicone billing)
Alternative providers PTB

Failure Scenarios

The gateway automatically retries on these failure types:

Rate Limiting (429)

Provider rate limit exceeded

model: "gpt-4o/openai,gpt-4o/azure"
// If OpenAI returns 429, immediately tries Azure

Exception: Helicone-generated 429s (escrow failure, rate limits) bail immediately without trying fallbacks.

Authentication Errors (401, 403)

Invalid or expired provider keys

model: "gpt-4o/openai,gpt-4o/azure"
// If OpenAI BYOK key is invalid, tries Azure

Service Errors (500, 502, 503)

Provider service unavailable

model: "claude-sonnet-4/anthropic,claude-sonnet-4/bedrock"
// If Anthropic is down, tries Bedrock

Timeout Errors

Request timeout

model: "llama-3.3-70b/groq,llama-3.3-70b/together"
// If Groq times out, tries Together AI

Disallowed Models

Model not available for PTB

model: "special-model/bedrock,gpt-4o"
// If model is disallowed for PTB, tries next option

Advanced Fallback Strategies

Multi-Model Fallback

Combine multiple models and providers:

const response = await client.chat.completions.create({
  model: [
    "gpt-4o/openai",           // Primary: GPT-4o on OpenAI
    "gpt-4o/azure",            // Fallback 1: GPT-4o on Azure
    "claude-sonnet-4/anthropic", // Fallback 2: Claude on Anthropic
    "gemini-2.0-flash/google-ai-studio", // Fallback 3: Gemini
  ].join(","),
  messages: [{ role: "user", content: "Hello!" }],
});

Conditional Fallbacks

Choose fallback strategy based on context:

function getFallbackModel(priority: "cost" | "speed" | "reliability") {
  const strategies = {
    cost: "gpt-4o-mini/deepinfra,gpt-4o-mini,claude-3-haiku",
    speed: "llama-3.3-70b/groq,gpt-4o-mini/openai,gemini-2.0-flash",
    reliability: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic,gemini-2.0-flash/vertex",
  };
  return strategies[priority];
}

const response = await client.chat.completions.create({
  model: getFallbackModel("reliability"),
  messages: [{ role: "user", content: "Hello!" }],
});

Provider Exclusions with Fallbacks

Exclude specific providers while maintaining fallbacks:

const response = await client.chat.completions.create({
  model: "!deepinfra,gpt-4o/openai,gpt-4o/azure",
  messages: [{ role: "user", content: "Hello!" }],
});
// Tries OpenAI and Azure, but never DeepInfra

Monitoring Fallbacks

Track fallback behavior in the Helicone dashboard:

View Request Details

Open any request in Requests

Check Attempts

See all provider attempts and which one succeeded

Analyze Patterns

Which providers fail most often?
How many fallback attempts typically occur?
What’s the success rate by provider?

Error Responses

When all fallback attempts fail, the gateway returns a consolidated error:

{
  "error": {
    "message": "All fallback attempts failed",
    "type": "all_attempts_failed",
    "attempts": [
      {
        "source": "gpt-4o/openai",
        "error": "Rate limit exceeded",
        "status": 429
      },
      {
        "source": "gpt-4o/azure",
        "error": "Service unavailable",
        "status": 503
      }
    ]
  }
}

Helicone 429s bail immediately:If Helicone returns a 429 (insufficient credits or rate limit), the request fails immediately without trying fallbacks. Add credits at helicone.ai/credits.

Fallback Best Practices

Start with Native Providers

List the native provider first for best compatibility:

// Good: Native provider first
model: "gpt-4o/openai,gpt-4o/azure,gpt-4o/deepinfra"

// Avoid: Non-native provider first
model: "gpt-4o/deepinfra,gpt-4o/openai"

Balance Cost and Reliability

Mix low-cost and reliable providers:

model: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai,gpt-4o"
// Try cheap first, fallback to reliable

Keep Fallback Chains Short

2-4 fallbacks is usually sufficient:

// Good: 3 fallbacks
model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4"

// Too many: 7+ fallbacks may cause latency
model: "model1,model2,model3,model4,model5,model6,model7"

Monitor Fallback Frequency

If fallbacks trigger frequently:

Check provider status
Verify BYOK keys
Consider changing primary provider
Review rate limits

Monitor at Helicone Dashboard

Test Your Fallback Strategy

Test fallbacks by:

Using invalid BYOK keys temporarily
Requesting rate-limited models
Monitoring which providers succeed

Real-World Examples

Production-Grade Fallback

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ai-gateway.helicone.ai",
  apiKey: process.env.HELICONE_API_KEY,
});

async function robustCompletion(prompt: string) {
  try {
    const response = await client.chat.completions.create({
      model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic,gemini-2.0-flash/vertex",
      messages: [{ role: "user", content: prompt }],
      max_tokens: 1000,
    });
    return response.choices[0].message.content;
  } catch (error) {
    // All fallbacks failed
    console.error("All providers failed:", error);
    throw error;
  }
}

Cost-Optimized with Fallback

async function costOptimizedCompletion(prompt: string, budget: "low" | "high") {
  const models = {
    low: "gpt-4o-mini/deepinfra,gpt-4o-mini/openai",
    high: "gpt-4o/openai,claude-sonnet-4/anthropic",
  };

  const response = await client.chat.completions.create({
    model: models[budget],
    messages: [{ role: "user", content: prompt }],
  });
  
  return response.choices[0].message.content;
}

Regional Resilience

async function regionalCompletion(prompt: string, preferredRegion: "us" | "eu") {
  const models = {
    us: "us.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic",
    eu: "eu.anthropic.claude-3-7-sonnet-20250219-v1:0/bedrock,claude-3-7-sonnet-20250219/anthropic",
  };

  const response = await client.chat.completions.create({
    model: models[preferredRegion],
    messages: [{ role: "user", content: prompt }],
  });
  
  return response.choices[0].message.content;
}

Streaming with Fallbacks

Fallbacks work seamlessly with streaming:

const stream = await client.chat.completions.create({
  model: "gpt-4o/openai,gpt-4o/azure,claude-sonnet-4/anthropic",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

If the first provider fails, the stream automatically switches to the next provider.

Next Steps

Routing

Learn more about provider routing

Getting Started

Set up the AI Gateway

Monitor Requests

Track fallback behavior

Browse Models

Explore available models

Get Started

AI Gateway

Observability

Prompt Management

Evaluation & Testing

Features

Self-Hosting

Integrations

Automatic Fallbacks

Overview

How Fallbacks Work

Basic Fallback Configuration

Same Model, Different Providers

Different Models

Cross-Provider Fallback

Common Fallback Patterns

Failure Scenarios

Advanced Fallback Strategies

Multi-Model Fallback

Conditional Fallbacks

Provider Exclusions with Fallbacks

Monitoring Fallbacks

Error Responses

Fallback Best Practices

Real-World Examples

Production-Grade Fallback

Cost-Optimized with Fallback

Regional Resilience

Streaming with Fallbacks

Next Steps

Routing

Getting Started

Monitor Requests

Browse Models

Build docs developers (and LLMs) love

Get Started

AI Gateway

Observability

Prompt Management

Evaluation & Testing

Features

Self-Hosting

Integrations

​Overview

​How Fallbacks Work

​Basic Fallback Configuration

​Same Model, Different Providers

​Different Models

​Cross-Provider Fallback

​Common Fallback Patterns

​Failure Scenarios

​Advanced Fallback Strategies

​Multi-Model Fallback

​Conditional Fallbacks

​Provider Exclusions with Fallbacks

​Monitoring Fallbacks

​Error Responses

​Fallback Best Practices

​Real-World Examples

​Production-Grade Fallback

​Cost-Optimized with Fallback

​Regional Resilience

​Streaming with Fallbacks

​Next Steps

Routing

Getting Started

Monitor Requests

Browse Models

Build docs developers (and LLMs) love

Overview

How Fallbacks Work

Basic Fallback Configuration

Same Model, Different Providers

Different Models

Cross-Provider Fallback

Common Fallback Patterns

Failure Scenarios

Advanced Fallback Strategies

Multi-Model Fallback

Conditional Fallbacks

Provider Exclusions with Fallbacks

Monitoring Fallbacks

Error Responses

Fallback Best Practices

Real-World Examples

Production-Grade Fallback

Cost-Optimized with Fallback

Regional Resilience

Streaming with Fallbacks

Next Steps