What is the AI Gateway?
LiteLLM’s AI Gateway (Proxy) is a production-ready server that provides a unified interface to 100+ LLM providers. It acts as a central gateway that handles authentication, load balancing, budgets, rate limits, and observability for all your AI model requests.Key Features
Unified API Interface
- OpenAI-Compatible: All requests use the OpenAI API format, regardless of the underlying provider
- Multi-Provider Support: Access OpenAI, Anthropic, Azure, Bedrock, Vertex AI, Cohere, and 100+ providers through a single endpoint
- Model Routing: Automatically route requests to the best available model based on your configuration
Authentication & Authorization
- Virtual Keys: Generate API keys with custom budgets, rate limits, and model access
- Team Management: Organize users into teams with shared budgets and permissions
- JWT Authentication: Support for custom JWT authentication flows
- Master Key: Secure admin access to all proxy management endpoints
Cost Control & Budgets
- Budget Tracking: Set budgets per key, user, or team
- Soft Budgets: Send alerts before hitting budget limits
- Budget Alerts: Webhook notifications when budgets are exceeded
- Spend Tracking: Real-time spend tracking across all requests
Load Balancing & Reliability
- Automatic Fallbacks: Retry failed requests on alternative deployments
- Usage-Based Routing: Distribute load across multiple deployments based on usage
- Health Checks: Continuous monitoring of model availability
- Rate Limiting: Prevent abuse with configurable rate limits
Observability
- Logging Callbacks: Send logs to Langfuse, Lunary, Helicone, Weights & Biases, and more
- Prometheus Metrics: Export metrics for monitoring and alerting
- Request Tracing: Track requests across the entire lifecycle
- Admin Dashboard: Web UI for managing keys, users, and viewing analytics
Architecture
Common Use Cases
Enterprise AI Gateway
- Centralized control over all AI model access
- Budget management and cost allocation
- Compliance and audit logging
- Team-based access control
Development Platform
- Provide AI capabilities to multiple applications
- Manage API keys for different environments
- Track usage across projects
- Test different models without code changes
Production Applications
- High availability with automatic fallbacks
- Load balancing across deployments
- Rate limiting and abuse prevention
- Observability and monitoring
How It Works
Request Flow
- Authentication: Client sends request with virtual key
- Authorization: Proxy validates the key, checks budgets, and permissions
- Routing: Request is routed to the appropriate model deployment
- Execution: Provider API is called with transformed request
- Response: Response is transformed to OpenAI format and returned
- Logging: Request metadata is logged to configured callbacks
Core Components
Proxy Server
The main FastAPI server (proxy_server.py) that handles all incoming requests and routes them to the appropriate handlers.
Router
Load balancing component that distributes requests across multiple model deployments based on health, usage, and configuration.Database
Optional PostgreSQL database for storing:- Virtual keys and their configurations
- User and team information
- Spend tracking and budget data
- Request logs
Cache Layer
Redis-based caching for:- API key validation
- User/team data
- Response caching (optional)
- Health check coordination
Next Steps
Quick Start
Get started with the AI Gateway in 5 minutes
Docker Deployment
Deploy with Docker and Docker Compose
Configuration
Learn about all configuration options
Virtual Keys
Manage API keys and authentication