AI Gateway Overview

What is the AI Gateway?

LiteLLM’s AI Gateway (Proxy) is a production-ready server that provides a unified interface to 100+ LLM providers. It acts as a central gateway that handles authentication, load balancing, budgets, rate limits, and observability for all your AI model requests.

Key Features

Unified API Interface

OpenAI-Compatible: All requests use the OpenAI API format, regardless of the underlying provider
Multi-Provider Support: Access OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Cohere, and 100+ providers through a single endpoint
Model Routing: Automatically route requests to the best available model based on your configuration

Authentication & Authorization

Virtual Keys: Generate API keys with custom budgets, rate limits, and model access
Team Management: Organize users into teams with shared budgets and permissions
JWT Authentication: Support for custom JWT authentication flows
Master Key: Secure admin access to all proxy management endpoints

Cost Control & Budgets

Budget Tracking: Set budgets per key, user, or team
Soft Budgets: Send alerts before hitting budget limits
Budget Alerts: Webhook notifications when budgets are exceeded
Spend Tracking: Real-time spend tracking across all requests

Load Balancing & Reliability

Automatic Fallbacks: Retry failed requests on alternative deployments
Usage-Based Routing: Distribute load across multiple deployments based on usage
Health Checks: Continuous monitoring of model availability
Rate Limiting: Prevent abuse with configurable rate limits

Observability

Logging Callbacks: Send logs to Langfuse, Lunary, Helicone, Weights & Biases, and more
Prometheus Metrics: Export metrics for monitoring and alerting
Request Tracing: Track requests across the entire lifecycle
Admin Dashboard: Web UI for managing keys, users, and viewing analytics

Architecture

Client Applications
        ↓
    [API Keys]
        ↓
LiteLLM Proxy Server
        ↓
  [Load Balancer]
        ↓
┌───────┴────────┐
│                │
OpenAI       Anthropic
Azure        Bedrock
Vertex AI    Cohere
... 100+ providers

Common Use Cases

Enterprise AI Gateway

Centralized control over all AI model access
Budget management and cost allocation
Compliance and audit logging
Team-based access control

Development Platform

Provide AI capabilities to multiple applications
Manage API keys for different environments
Track usage across projects
Test different models without code changes

Production Applications

High availability with automatic fallbacks
Load balancing across deployments
Rate limiting and abuse prevention
Observability and monitoring

How It Works

Deploy the Proxy

Run the LiteLLM proxy server with your configuration file

Configure Models

Define your model deployments in config.yaml with API keys and settings

Generate Virtual Keys

Create API keys for your applications with specific budgets and permissions

Make Requests

Use the virtual keys to make OpenAI-compatible requests to any provider

Request Flow

Authentication: Client sends request with virtual key
Authorization: Proxy validates the key, checks budgets, and permissions
Routing: Request is routed to the appropriate model deployment
Execution: Provider API is called with transformed request
Response: Response is transformed to OpenAI format and returned
Logging: Request metadata is logged to configured callbacks

Core Components

Proxy Server

The main FastAPI server (proxy_server.py) that handles all incoming requests and routes them to the appropriate handlers.

Router

Load balancing component that distributes requests across multiple model deployments based on health, usage, and configuration.

Database

Optional PostgreSQL database for storing:

Virtual keys and their configurations
User and team information
Spend tracking and budget data
Request logs

Cache Layer

Redis-based caching for:

API key validation
User/team data
Response caching (optional)
Health check coordination

Next Steps

Quick Start

Get started with the AI Gateway in 5 minutes

Docker Deployment

Deploy with Docker and Docker Compose

Configuration

Learn about all configuration options

Virtual Keys

Manage API keys and authentication

Get Started

Python SDK

AI Gateway (Proxy)

Core Features

Advanced

AI Gateway Overview

What is the AI Gateway?

Key Features

Unified API Interface

Authentication & Authorization

Cost Control & Budgets

Load Balancing & Reliability

Observability

Architecture

Common Use Cases

Enterprise AI Gateway

Development Platform

Production Applications

How It Works

Request Flow

Core Components

Proxy Server

Router

Database

Cache Layer

Next Steps

Quick Start

Docker Deployment

Configuration

Virtual Keys

Build docs developers (and LLMs) love

Get Started

Python SDK

AI Gateway (Proxy)

Core Features

Advanced

​What is the AI Gateway?

​Key Features

​Unified API Interface

​Authentication & Authorization

​Cost Control & Budgets

​Load Balancing & Reliability

​Observability

​Architecture

​Common Use Cases

​Enterprise AI Gateway

​Development Platform

​Production Applications

​How It Works

​Request Flow

​Core Components

​Proxy Server

​Router

​Database

​Cache Layer

​Next Steps

Quick Start

Docker Deployment

Configuration

Virtual Keys

Build docs developers (and LLMs) love

What is the AI Gateway?

Key Features

Unified API Interface

Authentication & Authorization

Cost Control & Budgets

Load Balancing & Reliability

Observability

Architecture

Common Use Cases

Enterprise AI Gateway

Development Platform

Production Applications

How It Works

Request Flow

Core Components

Proxy Server

Router

Database

Cache Layer

Next Steps