Provider Categories
Frontier Labs (3)
Frontier Labs (3)
Leading AI research labs with state-of-the-art models:
- Anthropic: Claude Opus, Sonnet, Haiku
- OpenAI: GPT-4o, GPT-4.1, o3, o4-mini
- Google: Gemini 2.5 Flash, Gemini 2.5 Pro
Cloud Platforms (1)
Cloud Platforms (1)
Enterprise cloud AI services:
- AWS Bedrock: Claude, Nova, Titan, Llama (via AWS)
Fast Inference (7)
Fast Inference (7)
Optimized for low-latency, high-throughput inference:
- Groq: Llama 3.3 70B, Mixtral
- DeepSeek: V3, R1 (reasoning model)
- Cerebras: Llama 3.3 70B (ultra-fast)
- SambaNova: Llama 3.1 405B, Llama 3.3 70B
- Fireworks: Llama 3.3 70B
- Together: Open-source models
- vLLM: Self-hosted fast inference
API Aggregators (2)
API Aggregators (2)
Multi-provider routing and access:
- OpenRouter: Access 100+ models through one API
- HuggingFace: Inference API for open models
Specialized Providers (6)
Specialized Providers (6)
Domain-specific or regional models:
- Perplexity: Sonar (search-augmented)
- Cohere: Command A, Command R+
- xAI: Grok-2, Grok-3
- Mistral: Large, Medium, Small
- Replicate: Open-source model hosting
- AI21: Jamba 1.5
Chinese Providers (5)
Chinese Providers (5)
Leading Chinese language models:
- Qwen (Alibaba): Qwen Max, Plus, Turbo
- Minimax: ABAB 7 Chat
- Zhipu AI: GLM-4, GLM-4 Plus
- Moonshot: Kimi (128K context)
- Baidu: ERNIE 4.0, ERNIE 3.5
Local Deployment (3)
Local Deployment (3)
Self-hosted, privacy-first options:
- Ollama: Run Llama, Qwen, Mistral locally
- LM Studio: Desktop GUI for local models
- vLLM: Production self-hosting
Provider Details
Anthropic
Anthropic Claude
Base URL:
API Key:
Driver: Native Anthropic SDKModels:
https://api.anthropic.comAPI Key:
ANTHROPIC_API_KEYDriver: Native Anthropic SDKModels:
claude-opus-4-6- Frontier reasoning (75 per 1M tokens)claude-sonnet-4-6- Smart general purpose (15 per 1M tokens)claude-haiku-4-5- Fast responses (4 per 1M tokens)
OpenAI
OpenAI GPT
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.openai.com/v1API Key:
OPENAI_API_KEYDriver: OpenAI-compatibleModels:
gpt-4o- Multimodal flagship (10 per 1M tokens)gpt-4.1- 1M context window (8 per 1M tokens)o3- Advanced reasoning (40 per 1M tokens)o4-mini- Fast reasoning (4.4 per 1M tokens)gpt-4o-mini- Cost-effective (0.6 per 1M tokens)
Google Gemini
Google Gemini
Base URL:
API Key:
Driver: Gemini-specificModels:
https://generativelanguage.googleapis.comAPI Key:
GEMINI_API_KEYDriver: Gemini-specificModels:
gemini-2.5-pro- Frontier multimodal (10 per 1M tokens)gemini-2.5-flash- Ultra-fast (0.6 per 1M tokens)
AWS Bedrock
AWS Bedrock
Base URL:
API Key:
Driver: Bedrock SDKModels:
https://bedrock-runtime.us-east-1.amazonaws.comAPI Key:
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEYDriver: Bedrock SDKModels:
bedrock-claude-sonnet- Claude Sonnet 4 via AWSbedrock-nova-pro- Amazon Nova Pro (300K context)bedrock-llama-3.3-70b- Llama 3.3 70B
DeepSeek
DeepSeek
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.deepseek.com/v1API Key:
DEEPSEEK_API_KEYDriver: OpenAI-compatibleModels:
deepseek-chat- Balanced performance (0.28 per 1M tokens)deepseek-reasoner- R1 reasoning model (2.19 per 1M tokens)
Groq
Groq
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.groq.com/openai/v1API Key:
GROQ_API_KEYDriver: OpenAI-compatibleModels:
llama-3.3-70b- Ultra-fast Llama inference (0.79 per 1M tokens)
Mistral AI
Mistral AI
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.mistral.ai/v1API Key:
MISTRAL_API_KEYDriver: OpenAI-compatibleModels:
mistral-large- Flagship European model (6 per 1M tokens)
Together AI
Together AI
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.together.xyz/v1API Key:
TOGETHER_API_KEYDriver: OpenAI-compatibleModels:
together-llama-3.3-70b- Open-source Llama (0.88 per 1M tokens)
Fireworks AI
Fireworks AI
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.fireworks.ai/inference/v1API Key:
FIREWORKS_API_KEYDriver: OpenAI-compatibleModels:
fireworks-llama-3.3-70b- Fast Llama hosting (0.9 per 1M tokens)
Cohere
Cohere
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.cohere.ai/v1API Key:
COHERE_API_KEYDriver: OpenAI-compatibleModels:
command-a- Latest flagship (10 per 1M tokens)command-r-plus- RAG-optimized (15 per 1M tokens)command-r- Balanced (1.5 per 1M tokens)
Perplexity
Perplexity AI
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.perplexity.aiAPI Key:
PERPLEXITY_API_KEYDriver: OpenAI-compatibleModels:
sonar-pro- Search-augmented answers (15 per 1M tokens)sonar- Balanced search (1 per 1M tokens)
xAI
xAI Grok
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.x.ai/v1API Key:
XAI_API_KEYDriver: OpenAI-compatibleModels:
grok-3- Frontier reasoning (15 per 1M tokens)grok-2- Smart general purpose (10 per 1M tokens)grok-3-mini- Fast responses (0.5 per 1M tokens)
Replicate
Replicate
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.replicate.com/v1API Key:
REPLICATE_API_TOKENDriver: OpenAI-compatibleModels:
replicate-llama-3.3-70b- Llama 3.3 70B Instruct
Ollama (Local)
Ollama
Base URL:
API Key: Not required
Driver: OpenAI-compatibleModels: Any model from ollama.ai/library
http://localhost:11434/v1API Key: Not required
Driver: OpenAI-compatibleModels: Any model from ollama.ai/library
llama3.3,qwen2.5,deepseek-r1,mistral,phi4, etc.
vLLM (Local)
vLLM
Base URL:
API Key: Not required
Driver: OpenAI-compatibleFeatures: Production self-hosting, GPU optimization, PagedAttention
http://localhost:8000/v1API Key: Not required
Driver: OpenAI-compatibleFeatures: Production self-hosting, GPU optimization, PagedAttention
LM Studio (Local)
LM Studio
Base URL:
API Key: Not required
Driver: OpenAI-compatibleFeatures: Desktop GUI, one-click setup, model library
http://localhost:1234/v1API Key: Not required
Driver: OpenAI-compatibleFeatures: Desktop GUI, one-click setup, model library
OpenRouter
OpenRouter
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://openrouter.ai/api/v1API Key:
OPENROUTER_API_KEYDriver: OpenAI-compatibleModels:
openrouter-auto- Automatic routing across 100+ models
HuggingFace
HuggingFace
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api-inference.huggingface.coAPI Key:
HF_API_KEYDriver: OpenAI-compatibleModels:
hf-llama-3.3-70b- Llama 3.3 70B Instructhf-mistral-7b- Mistral 7B (free tier)
AI21 Labs
AI21 Labs
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.ai21.com/studio/v1API Key:
AI21_API_KEYDriver: OpenAI-compatibleModels:
jamba-1.5-large- 256K context (8 per 1M tokens)jamba-1.5-mini- Fast variant (0.4 per 1M tokens)
Cerebras
Cerebras
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.cerebras.ai/v1API Key:
CEREBRAS_API_KEYDriver: OpenAI-compatibleModels:
cerebras-llama-3.3-70b- Ultra-fast Llama (0.6 per 1M tokens)
SambaNova
SambaNova
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.sambanova.ai/v1API Key:
SAMBANOVA_API_KEYDriver: OpenAI-compatibleModels:
samba-llama-3.1-405b- Largest Llama (10 per 1M tokens)samba-llama-3.3-70b- Balanced (0.6 per 1M tokens)
Qwen (Alibaba)
Qwen
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://dashscope.aliyuncs.com/compatible-mode/v1API Key:
DASHSCOPE_API_KEYDriver: OpenAI-compatibleModels:
qwen-max- Flagship model (9.6 per 1M tokens)qwen-plus- Balanced (1.5 per 1M tokens)qwen-turbo- Fast, 1M context (0.15 per 1M tokens)
MiniMax
MiniMax
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.minimax.chat/v1API Key:
MINIMAX_API_KEYDriver: OpenAI-compatibleModels:
abab7-chat- ABAB 7 (1 per 1M tokens)
Zhipu AI
Zhipu AI (GLM)
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://open.bigmodel.cn/api/paas/v4API Key:
ZHIPU_API_KEYDriver: OpenAI-compatibleModels:
glm-4-plus- Advanced (7 per 1M tokens)glm-4- Balanced (1.4 per 1M tokens)
Moonshot (Kimi)
Moonshot
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.moonshot.cn/v1API Key:
MOONSHOT_API_KEYDriver: OpenAI-compatibleModels:
moonshot-v1-128k- 128K context (8.5 per 1M tokens)moonshot-v1-32k- 32K context (3.3 per 1M tokens)
Baidu Qianfan
Baidu ERNIE
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://aip.baidubce.com/rpc/2.0API Key:
QIANFAN_API_KEYDriver: OpenAI-compatibleModels:
ernie-4.0-turbo- Advanced (8.4 per 1M tokens)ernie-3.5-turbo- Balanced (1.12 per 1M tokens)
GitHub Copilot
GitHub Copilot
Base URL:
API Key:
Driver: OpenAI-compatibleModels:
https://api.githubcopilot.comAPI Key:
GITHUB_TOKENDriver: OpenAI-compatibleModels:
copilot-gpt-4o- GPT-4o via Copilot subscription
Provider Selection
Choose providers based on:- Cost: Local (free) → Fast inference → Cloud APIs → Frontier labs
- Latency: Groq, Cerebras → vLLM → Standard APIs
- Privacy: Ollama, vLLM, LM Studio (100% local)
- Compliance: AWS Bedrock (SOC2, HIPAA, FedRAMP)
- Language: Chinese models for Chinese content
- Features: Tool use, vision, long context
Testing Providers
Next Steps
Model Catalog
Browse all 47 models with pricing
Routing Logic
Learn complexity-based selection