Overview
LLM Gateway supports 20+ AI providers and can automatically route requests to the best available provider based on cost, uptime, latency, and availability.
Supported Providers
OpenAI GPT-4o, GPT-4, GPT-3.5 Turbo, and more
Anthropic Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
Google AI Studio Gemini 2.0 Flash, Gemini 1.5 Pro
Google Vertex AI Gemini models via Google Cloud
AWS Bedrock Claude, Llama, Mistral via AWS
Azure OpenAI Enterprise OpenAI models
DeepSeek DeepSeek-V3, DeepSeek-Chat
Groq Ultra-fast Llama, Mixtral inference
Cerebras High-performance Llama models
Mistral AI Mistral Large, Mixtral, Pixtral
Perplexity Search-augmented models
Automatic Provider Selection
When you don’t specify a provider, LLM Gateway automatically selects the best one:
apps/gateway/src/chat/chat.ts
// Get available providers based on project mode
if ( project . mode === "api-keys" ) {
const providerKeys = await findActiveProviderKeys ( project . organizationId );
availableProviders = providerKeys . map (( key ) => key . provider );
} else if ( project . mode === "credits" || project . mode === "hybrid" ) {
// Check which providers have environment tokens available
const envProviders : string [] = [];
for ( const provider of supportedProviders ) {
if ( hasProviderEnvironmentToken ( provider as Provider )) {
envProviders . push ( provider );
}
}
availableProviders = envProviders ;
}
Selection Criteria
The gateway considers multiple factors:
Cost - Prioritizes cheaper providers for equivalent models
Uptime - Avoids providers with recent failures (less than 90% uptime)
Latency - Favors faster providers in streaming mode
Availability - Only considers providers you have configured
Capabilities - Filters by required features (vision, tools, JSON output, etc.)
The selection algorithm is optimized to minimize cost while maintaining reliability and performance.
Every response includes detailed routing information:
{
"metadata" : {
"routing" : {
"availableProviders" : [ "openai" , "anthropic" , "google-ai-studio" ],
"selectedProvider" : "anthropic" ,
"selectionReason" : "lowest-cost" ,
"providerScores" : [
{
"providerId" : "anthropic" ,
"score" : 0.95 ,
"price" : 0.000003 ,
"uptime" : 99.8 ,
"latency" : 245 ,
"throughput" : 42
},
{
"providerId" : "openai" ,
"score" : 0.87 ,
"price" : 0.0000025 ,
"uptime" : 99.5 ,
"latency" : 312 ,
"throughput" : 38
}
]
}
}
}
Automatic Fallback
If a provider fails, LLM Gateway automatically retries with alternative providers:
apps/gateway/src/chat/tools/retry-with-fallback.ts
export const MAX_RETRIES = 3 ;
export function shouldRetryRequest (
statusCode : number ,
errorType : string ,
attempt : number ,
) : boolean {
if ( attempt >= MAX_RETRIES ) {
return false ;
}
// Retry on server errors and rate limits
if ( statusCode >= 500 || statusCode === 429 ) {
return true ;
}
// Retry on timeout and network errors
if ( errorType === "timeout" || errorType === "network" ) {
return true ;
}
return false ;
}
Retry Strategy
The gateway implements smart retry logic:
Up to 3 retries per request
Exponential backoff between retries
Automatic provider switching on failure
Excludes failed providers from subsequent attempts
Preserves successful responses even if billing fails
# Request to gpt-4o
curl https://api.llmgateway.io/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
# If OpenAI fails, gateway automatically tries:
# 1. Google AI Studio (gpt-4o equivalent)
# 2. Anthropic Claude 3.5 Sonnet
# 3. Other configured providers
{
"metadata" : {
"routing" : [
{
"provider" : "openai" ,
"model" : "gpt-4o-2024-08-06" ,
"status_code" : 503 ,
"error_type" : "service_unavailable" ,
"succeeded" : false
},
{
"provider" : "google-ai-studio" ,
"model" : "gemini-2.0-flash-001" ,
"status_code" : 200 ,
"error_type" : "" ,
"succeeded" : true
}
]
}
}
Low-Uptime Fallback
The gateway monitors provider health and automatically routes around unhealthy providers:
apps/gateway/src/chat/chat.ts
// Fetch uptime metrics for the requested provider
const metrics = metricsMap . get ( ` ${ baseModelId } : ${ usedProvider } ` );
// If uptime is below 90%, route to an alternative
if ( metrics && metrics . uptime !== undefined && metrics . uptime < 90 ) {
const betterUptimeProviders = availableModelProviders . filter (( p ) => {
const providerMetrics = allMetricsMap . get ( ` ${ modelId } : ${ p . providerId } ` );
return ( ! providerMetrics || ( providerMetrics . uptime ?? 100 ) > currentUptime );
});
// Select cheapest provider with better uptime
const cheapestResult = getCheapestFromAvailableProviders (
betterUptimeProviders ,
modelWithPricing ,
{ metricsMap: allMetricsMap , isStreaming: stream }
);
}
You can disable automatic fallback by setting the X-No-Fallback: true header.
Provider Configuration
LLM Gateway supports three project modes:
1. API Keys Mode
Use your own provider API keys:
# Configure in the dashboard or via API
POST /keys/provider
{
"provider" : "anthropic",
"token" : "sk-ant-...",
"organizationId" : "org_..."
}
2. Credits Mode
Use LLM Gateway’s provider keys with credit-based billing:
client = OpenAI(
base_url = "https://api.llmgateway.io/v1" ,
api_key = "YOUR_LLMGATEWAY_API_KEY"
)
# Automatically uses gateway's provider keys
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
3. Hybrid Mode
Combine your own keys with gateway credits:
Uses your provider keys when available
Falls back to gateway credits when needed
Best for cost optimization and reliability
Environment Variables
Providers can be configured via environment variables:
packages/models/src/providers.ts
export interface ProviderDefinition {
id : string ;
name : string ;
env : {
required : {
apiKey ?: string ;
[ key : string ] : string | undefined ;
};
optional ?: Record < string , string >;
};
}
// Example: OpenAI
{
id : "openai" ,
name : "OpenAI" ,
env : {
required : {
apiKey : "LLM_OPENAI_API_KEY"
}
}
}
// Example: Google Vertex AI
{
id : "google-vertex" ,
name : "Google Vertex AI" ,
env : {
required : {
apiKey : "LLM_GOOGLE_VERTEX_API_KEY" ,
project : "LLM_GOOGLE_CLOUD_PROJECT"
},
optional : {
region : "LLM_GOOGLE_VERTEX_REGION"
}
}
}
Custom Providers
Add custom OpenAI-compatible providers:
POST /keys/provider
{
"provider" : "custom",
"providerName" : "my-custom-provider",
"baseUrl" : "https://api.example.com/v1",
"token" : "your-api-key",
"organizationId" : "org_..."
}
Then use it:
response = client.chat.completions.create(
model = "custom:my-custom-provider/model-name" ,
messages = [{ "role" : "user" , "content" : "Hello" }]
)
Provider Metrics
Monitor provider performance in real-time:
packages/db/src/queries.ts
export async function getProviderMetricsForCombinations(
combinations: Array < { modelId: string; providerId: string } >
): Promise < Map < string, ProviderMetrics >> {
// Returns uptime, latency, and throughput for each provider
// Data is aggregated from the last 5 minutes of requests
}
Metrics include:
Uptime - Success rate (200 status codes)
Average Latency - Time to first token
Throughput - Tokens per second
Provider Priority
Some providers have priority weights for routing:
packages/models/src/providers.ts
{
id : "google-ai-studio" ,
priority : 0.8 // 20% lower priority (score × 1.25)
}
{
id : "aws-bedrock" ,
priority : 0.9 // 10% lower priority (score × 1.11)
}
Priority affects the routing score calculation. Lower priority providers are chosen less often unless they have significantly better cost or performance.