Overview
Batch processing allows you to send asynchronous requests at significantly reduced costs. LiteLLM supports batch APIs across providers including OpenAI and Anthropic.Quick Start
When to Use Batching
Cost Savings
Cost Savings
- 50% lower cost compared to synchronous API
- Ideal for non-urgent, high-volume tasks
- Best for offline processing
Use Cases
Use Cases
- Data classification and labeling
- Content generation for datasets
- Evaluation and testing
- Embedding large corpora
- Bulk data transformation
When Not to Use
When Not to Use
- Real-time applications
- User-facing features
- Time-sensitive tasks
- Interactive workflows
OpenAI Batch API
- Create Batch
- Check Status
- Retrieve Results
- Cancel Batch
Anthropic Batch API
Complete Workflow
List Batches
Batch with Different Request Types
Error Handling
Monitoring Progress
Cost Calculation
Best Practices
Request Preparation
Request Preparation
- Use descriptive
custom_idvalues for tracking - Validate requests before creating batch
- Keep batch size reasonable (1000-50000 requests)
- Include metadata for organization
Monitoring
Monitoring
- Poll status periodically (every 1-5 minutes)
- Set up alerts for completion/failure
- Monitor request counts for progress
- Log batch IDs for tracking
Error Handling
Error Handling
- Check for errors in individual results
- Implement retry logic for failed requests
- Save partial results before processing
- Have fallback for batch failures
Cost Optimization
Cost Optimization
- Use batching for all non-urgent tasks
- Combine similar requests into batches
- Use cheaper models when appropriate
- Monitor and optimize batch sizes
Limitations
Supported Features
| Feature | OpenAI | Anthropic |
|---|---|---|
| Chat Completions | ✅ | ✅ |
| Embeddings | ✅ | ❌ |
| Function Calling | ✅ | ✅ |
| Streaming | ❌ | ❌ |
| Vision | ✅ | ✅ |
| JSON Mode | ✅ | ✅ |
| Max Requests | 50,000 | Varies |
| Completion Window | 24h | Varies |