Fallbacks

Overview

Fallbacks ensure high availability by automatically routing failed requests to backup providers. When your primary LLM provider experiences downtime or returns an error, the Gateway seamlessly switches to an alternative provider without interrupting your application.

How It Works

The Gateway monitors response status codes and automatically triggers fallback logic when specified error conditions occur. Fallbacks can be:

Provider-level: Switch from OpenAI to Anthropic
Model-level: Switch from GPT-4 to Claude 3.5 Sonnet
API key-level: Use different API keys for the same provider

Fallbacks work in conjunction with retries. The Gateway will exhaust retry attempts on the primary target before falling back to the next provider.

Configuration

Basic Fallback

Fallback to a secondary provider when the primary fails:

{
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "override_params": {
        "model": "gpt-4o"
      }
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***",
      "override_params": {
        "model": "claude-3-5-sonnet-20240620"
      }
    }
  ]
}

Conditional Fallback

Fallback only on specific status codes:

{
  "strategy": {
    "mode": "fallback",
    "on_status_codes": [429, 500, 502, 503, 504]
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "azure-openai",
      "api_key": "***",
      "custom_host": "https://your-resource.openai.azure.com"
    }
  ]
}

Multi-Level Fallback Chain

Create a cascade of fallback providers:

{
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***",
      "override_params": { "model": "gpt-4o" }
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***",
      "override_params": { "model": "claude-3-5-sonnet-20240620" }
    },
    {
      "provider": "groq",
      "api_key": "***",
      "override_params": { "model": "llama-3.1-70b-versatile" }
    }
  ]
}

Usage Examples

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY",
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "provider": "openai",
                "api_key": "sk-***",
                "override_params": {"model": "gpt-4o"}
            },
            {
                "provider": "anthropic",
                "api_key": "sk-ant-***",
                "override_params": {"model": "claude-3-5-sonnet-20240620"}
            }
        ]
    }
)

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-4o"
)

Advanced Patterns

Fallback with Retries

Combine fallback with retry logic for maximum resilience:

{
  "retry": {
    "attempts": 3,
    "on_status_codes": [429, 500, 502, 503, 504]
  },
  "strategy": {
    "mode": "fallback"
  },
  "targets": [
    {
      "provider": "openai",
      "api_key": "sk-***"
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***"
    }
  ]
}

The Gateway will:

Attempt the request with OpenAI
Retry up to 3 times on failure
Fallback to Anthropic if all retries fail
Retry up to 3 times with Anthropic

Fallback with Load Balancing

Combine fallback with load balancing for horizontal scaling:

{
  "strategy": { "mode": "fallback" },
  "targets": [
    {
      "strategy": { "mode": "loadbalance" },
      "targets": [
        {"provider": "openai", "api_key": "sk-***-1", "weight": 0.5},
        {"provider": "openai", "api_key": "sk-***-2", "weight": 0.5}
      ]
    },
    {
      "provider": "anthropic",
      "api_key": "sk-ant-***"
    }
  ]
}

Response Headers

The Gateway includes headers to track fallback behavior:

x-portkey-last-used-option-index: 1
x-portkey-retry-attempt-count: 2

x-portkey-last-used-option-index: Index of the target that successfully handled the request (0-based)
x-portkey-retry-attempt-count: Number of retry attempts made

Best Practices

Choose Compatible Models

Ensure fallback targets use models with similar capabilities. Falling back from GPT-4 to a much weaker model may produce unexpected results.

Monitor Fallback Rates

Track how often fallbacks occur to identify reliability issues with your primary provider. Use the Gateway Console to monitor fallback patterns.

Test Your Fallback Chain

Regularly test your fallback configuration to ensure it behaves as expected under failure conditions.

Consider Cost Implications

Fallback providers may have different pricing. Monitor your costs when fallbacks are triggered frequently.

When using fallback between different providers (e.g., OpenAI to Anthropic), be aware that:

Response formats may differ slightly
Model-specific features may not be available
Token counting may vary between providers

Supported Status Codes

By default, fallbacks trigger on:

429 - Rate limit exceeded
500 - Internal server error
502 - Bad gateway
503 - Service unavailable
504 - Gateway timeout

Customize using the on_status_codes parameter in your config.

Retries

Automatically retry failed requests with exponential backoff

Load Balancing

Distribute requests across multiple providers

Timeouts

Set request timeout limits

Configs

Learn more about Gateway Configs

Getting Started

Core Concepts

Features

MCP Gateway

Deployment

Overview

How It Works

Configuration

Basic Fallback

Conditional Fallback

Multi-Level Fallback Chain

Usage Examples

Advanced Patterns

Fallback with Retries

Fallback with Load Balancing

Response Headers

Best Practices

Supported Status Codes

Retries

Load Balancing

Timeouts

Configs

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Features

MCP Gateway

Deployment

​Overview

​How It Works

​Configuration

​Basic Fallback

​Conditional Fallback

​Multi-Level Fallback Chain

​Usage Examples

​Advanced Patterns

​Fallback with Retries

​Fallback with Load Balancing

​Response Headers

​Best Practices

​Supported Status Codes

​Related Features

Retries

Load Balancing

Timeouts

Configs

Build docs developers (and LLMs) love

Overview

How It Works

Configuration

Basic Fallback

Conditional Fallback

Multi-Level Fallback Chain

Usage Examples

Advanced Patterns

Fallback with Retries

Fallback with Load Balancing

Response Headers

Best Practices

Supported Status Codes

Related Features