Overview
LiteLLM provides full support for OpenAI’s models including GPT-4o, O1, O3-mini, and more. You can use all OpenAI features including streaming, function calling, vision, audio, and batch processing.Quick Start
Supported Models
- GPT-4o
- O-Series (Reasoning)
- GPT-4 Turbo
- GPT-3.5
Latest and most capable GPT-4 models with optimized performance.
Authentication
- Environment Variable
- Direct Parameter
- Custom Base URL
- Organization ID
Set your OpenAI API key as an environment variable:
Streaming
Get real-time responses as they’re generated:Async Streaming
Function Calling
OpenAI models support sophisticated function/tool calling:Vision (Multimodal)
GPT-4o and GPT-4 Turbo support image inputs:- Image URL
- Base64 Image
- Multiple Images
- Image Detail Level
JSON Mode
Force models to return valid JSON:Advanced Features
Seed for Reproducibility
Logprobs
Max Tokens and Stop Sequences
Temperature and Top P
Embeddings
Generate text embeddings for semantic search and clustering:Available Embedding Models
| Model | Dimensions | Use Case |
|---|---|---|
text-embedding-3-large | 3072 (default) | Best performance |
text-embedding-3-small | 1536 (default) | Good balance |
text-embedding-ada-002 | 1536 | Legacy model |
Batch Processing
Process large volumes of requests asynchronously:Error Handling
Cost Tracking
Best Practices
Use GPT-4o-mini First
Start with
gpt-4o-mini for testing - it’s fast and cost-effective. Upgrade to gpt-4o when you need maximum quality.Set Max Tokens
Always set
max_tokens to prevent unexpectedly long (and expensive) responses.Use Streaming
Enable streaming for better user experience in interactive applications.
Handle Rate Limits
Implement exponential backoff when handling
RateLimitError exceptions.Related Documentation
Streaming
Learn more about streaming responses
Function Calling
Deep dive into function calling
Vision
Working with images and vision models
Embeddings
Guide to embeddings and semantic search