Error Responses

Overview

The LLM Gateway API uses standard HTTP status codes to indicate the success or failure of requests. Error responses include a detail field with a human-readable message explaining the issue.

Error Response Schema

All error responses follow this structure:

detail

string

required

A human-readable message describing the error.

HTTP Status Codes

401 Unauthorized

Returned when the API key is invalid, missing, or not authorized to access the resource.

{
  "detail": "Invalid or missing API Key"
}

Show Common Causes

Missing X-API-Key header in the request
API key is invalid or has been revoked
API key does not have permission for the requested resource
Typo in the API key value

Always include your API key in the X-API-Key header of your requests. Keep your API keys secure and never expose them in client-side code.

429 Too Many Requests

Returned when you exceed the rate limit for your API key or IP address.

{
  "detail": "Too many requests. Please wait before trying again."
}

Show Rate Limiting Details

The LLM Gateway implements a token bucket rate limiting algorithm:

Capacity: Maximum number of requests in the bucket
Refill Rate: Rate at which request tokens are added back
Client Identification: Based on API key or IP address

Rate limits are enforced per client (API key or IP) and are backed by Redis for distributed rate limiting.

When you receive a 429 error, implement exponential backoff in your retry logic. Wait a few seconds before retrying, and increase the wait time with each subsequent failure.

500 Internal Server Error

Returned when an unexpected error occurs on the server side.

{
  "detail": "Internal server error occurred while processing your request"
}

Show Common Causes

Downstream LLM provider is unavailable
Database or Redis connection failure
Unexpected exception in request processing
Service configuration error
Resource exhaustion (memory, connections)

If you consistently receive 500 errors, check the API status page or contact support. These errors are typically temporary and resolved automatically.

Error Handling Best Practices

Retry Logic

Implement proper retry logic with exponential backoff:

import requests
import time
from typing import Optional

def chat_with_retry(
    messages: list,
    api_key: str,
    max_retries: int = 3
) -> Optional[dict]:
    url = "https://api.example.com/v1/chat"
    headers = {"X-API-Key": api_key}
    payload = {"messages": messages}
    
    for attempt in range(max_retries):
        try:
            response = requests.post(url, headers=headers, json=payload)
            
            if response.status_code == 200:
                return response.json()
            
            elif response.status_code == 429:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
            
            elif response.status_code == 401:
                print("Authentication failed. Check your API key.")
                return None
            
            elif response.status_code == 500:
                wait_time = 2 ** attempt
                print(f"Server error. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            
            else:
                print(f"Unexpected error: {response.status_code}")
                return None
                
        except requests.RequestException as e:
            print(f"Request failed: {e}")
            time.sleep(2 ** attempt)
    
    print("Max retries exceeded")
    return None

Error Response Validation

Always validate error responses before processing:

Python

response = requests.post(url, headers=headers, json=payload)

if not response.ok:
    error_data = response.json()
    error_message = error_data.get("detail", "Unknown error")
    
    # Log error for monitoring
    logger.error(f"API Error {response.status_code}: {error_message}")
    
    # Handle specific error types
    if response.status_code == 401:
        raise AuthenticationError(error_message)
    elif response.status_code == 429:
        raise RateLimitError(error_message)
    elif response.status_code >= 500:
        raise ServerError(error_message)

Error Response From Source

Based on the source code in chat.py, here are the specific error conditions:

Authentication Check (Line 28-32)

Python

if api_key not in valid_keys:
    raise HTTPException(
        status_code=401,
        detail="Invalid or missing API Key"
    )

Rate Limit Check (Line 35-40)

Python

if not rate_limiter.allow(key):
    RATE_LIMIT_BLOCKED.inc()
    raise HTTPException(
        status_code=429, 
        detail="Too many requests. Please wait before trying again."
    )

Summary

Status Code	Error Type	Retry?	Common Fix
401	Unauthorized	No	Check API key in `X-API-Key` header
429	Rate Limited	Yes	Implement exponential backoff
500	Server Error	Yes	Wait and retry, contact support if persistent

ChatRequest - Request schema with validation rules
ChatResponse - Successful response schema
Authentication - API key setup and usage

Endpoints

Schemas

Overview

Error Response Schema

HTTP Status Codes

401 Unauthorized

429 Too Many Requests

500 Internal Server Error

Error Handling Best Practices

Retry Logic

Error Response Validation

Error Response From Source

Authentication Check (Line 28-32)

Rate Limit Check (Line 35-40)

Summary

Build docs developers (and LLMs) love

Endpoints

Schemas

​Overview

​Error Response Schema

​HTTP Status Codes

​401 Unauthorized

​429 Too Many Requests

​500 Internal Server Error

​Error Handling Best Practices

​Retry Logic

​Error Response Validation

​Error Response From Source

​Authentication Check (Line 28-32)

​Rate Limit Check (Line 35-40)

​Summary

​Related Documentation

Build docs developers (and LLMs) love

Overview

Error Response Schema

HTTP Status Codes

401 Unauthorized

429 Too Many Requests

500 Internal Server Error

Error Handling Best Practices

Retry Logic

Error Response Validation

Error Response From Source

Authentication Check (Line 28-32)

Rate Limit Check (Line 35-40)

Summary

Related Documentation