Overview
DeepInfra provides access to 100+ open-source and proprietary AI models with cost-effective inference, serverless deployment, and pay-as-you-go pricing. Perfect for developers seeking affordable AI at scale. Base URL:https://api.deepinfra.com/v1/openai
Supported Features
- ✅ Chat Completions
- ✅ Streaming
- ✅ Vision (select models)
- ✅ Function Calling (select models)
- ❌ Embeddings (via separate API)
- ❌ Image Generation (via separate API)
- ❌ Fine-tuning
Quick Start
Chat Completions
Streaming
Popular Models
Meta Llama
| Model | Context | Price Tier | Description |
|---|---|---|---|
meta-llama/Meta-Llama-3.1-405B-Instruct | 128K | Premium | Largest Llama |
meta-llama/Meta-Llama-3.1-70B-Instruct | 128K | Mid | Balanced |
meta-llama/Meta-Llama-3.1-8B-Instruct | 128K | Budget | Fast, cheap |
meta-llama/Llama-3.2-90B-Vision-Instruct | 128K | Premium | Vision |
Mistral & Mixtral
| Model | Context | Price Tier |
|---|---|---|
mistralai/Mixtral-8x22B-Instruct-v0.1 | 64K | Mid |
mistralai/Mixtral-8x7B-Instruct-v0.1 | 32K | Budget |
mistralai/Mistral-7B-Instruct-v0.3 | 32K | Budget |
Qwen
| Model | Context | Description |
|---|---|---|
Qwen/Qwen2.5-72B-Instruct | 32K | Latest Qwen |
Qwen/Qwen2.5-7B-Instruct | 32K | Efficient |
Qwen/QwQ-32B-Preview | 32K | Reasoning |
Specialized Models
| Model | Type | Use Case |
|---|---|---|
microsoft/WizardLM-2-8x22B | Code/Chat | Coding tasks |
cognitivecomputations/dolphin-2.6-mixtral-8x7b | Chat | Uncensored |
lizpreciatior/lzlv_70b_fp16_hf | Roleplay | Creative |
DeepInfra excels at:
- Cost-effectiveness - Up to 10x cheaper than alternatives
- Model variety - 100+ models available
- Serverless - No infrastructure management
- Pay-as-you-go - No minimum commitment
- Fast deployment - Instant access to models
Configuration Options
| Header | Description | Required |
|---|---|---|
Authorization | DeepInfra API key | Yes |
Advanced Features
System Messages
Temperature and Sampling
Vision Models
Multi-turn Conversations
Cost Optimization
Choose the Right Model
Set Token Limits
Fallback Configuration
Fallback to OpenAI if needed:Load Balancing
Balance cost vs quality:Error Handling
Best Practices
- Start with smaller models - Test with 8B before using 70B
- Set max_tokens - Control costs
- Use streaming - Better UX
- Cache responses - Reduce API calls
- Monitor costs - DeepInfra has usage dashboard
- Choose right model - Balance cost vs quality
- Batch similar requests - More efficient
- Handle rate limits - Implement backoff
Use Cases
Budget-Conscious Development
High-Volume Applications
A/B Testing Models
Rate Limits
- Generous free tier for testing
- Pay-as-you-go with no minimums
- Rate limits based on tier
- Contact DeepInfra for enterprise needs
Pricing Advantages
DeepInfra typically offers:- 50-90% cheaper than major providers
- No minimum spend requirement
- Free credits for new users
- Transparent pricing per token
DeepInfra Pricing
View detailed pricing for all DeepInfra models
Getting Started
- Sign up at DeepInfra
- Get your API key
- Start with free credits
- Scale as needed
Related Resources
Together AI
Alternative open models platform
Cost Optimization
Reduce AI costs
Load Balancing
Balance cost vs quality
Caching
Cache for cost savings