Skip to main content
AgentOS supports 25 LLM providers spanning frontier labs, cloud platforms, specialized inference providers, and local deployment options.

Provider Categories

Leading AI research labs with state-of-the-art models:
  • Anthropic: Claude Opus, Sonnet, Haiku
  • OpenAI: GPT-4o, GPT-4.1, o3, o4-mini
  • Google: Gemini 2.5 Flash, Gemini 2.5 Pro
Enterprise cloud AI services:
  • AWS Bedrock: Claude, Nova, Titan, Llama (via AWS)
Optimized for low-latency, high-throughput inference:
  • Groq: Llama 3.3 70B, Mixtral
  • DeepSeek: V3, R1 (reasoning model)
  • Cerebras: Llama 3.3 70B (ultra-fast)
  • SambaNova: Llama 3.1 405B, Llama 3.3 70B
  • Fireworks: Llama 3.3 70B
  • Together: Open-source models
  • vLLM: Self-hosted fast inference
Multi-provider routing and access:
  • OpenRouter: Access 100+ models through one API
  • HuggingFace: Inference API for open models
Domain-specific or regional models:
  • Perplexity: Sonar (search-augmented)
  • Cohere: Command A, Command R+
  • xAI: Grok-2, Grok-3
  • Mistral: Large, Medium, Small
  • Replicate: Open-source model hosting
  • AI21: Jamba 1.5
Leading Chinese language models:
  • Qwen (Alibaba): Qwen Max, Plus, Turbo
  • Minimax: ABAB 7 Chat
  • Zhipu AI: GLM-4, GLM-4 Plus
  • Moonshot: Kimi (128K context)
  • Baidu: ERNIE 4.0, ERNIE 3.5
Self-hosted, privacy-first options:
  • Ollama: Run Llama, Qwen, Mistral locally
  • LM Studio: Desktop GUI for local models
  • vLLM: Production self-hosting

Provider Details

Anthropic

Anthropic Claude

Base URL: https://api.anthropic.com
API Key: ANTHROPIC_API_KEY
Driver: Native Anthropic SDK
Models:
  • claude-opus-4-6 - Frontier reasoning (15/15/75 per 1M tokens)
  • claude-sonnet-4-6 - Smart general purpose (3/3/15 per 1M tokens)
  • claude-haiku-4-5 - Fast responses (0.8/0.8/4 per 1M tokens)
Features: Tool use, vision, 200K context, streaming

OpenAI

OpenAI GPT

Base URL: https://api.openai.com/v1
API Key: OPENAI_API_KEY
Driver: OpenAI-compatible
Models:
  • gpt-4o - Multimodal flagship (2.5/2.5/10 per 1M tokens)
  • gpt-4.1 - 1M context window (2/2/8 per 1M tokens)
  • o3 - Advanced reasoning (10/10/40 per 1M tokens)
  • o4-mini - Fast reasoning (1.1/1.1/4.4 per 1M tokens)
  • gpt-4o-mini - Cost-effective (0.15/0.15/0.6 per 1M tokens)
Features: Tool use, vision, JSON mode, function calling

Google Gemini

Google Gemini

Base URL: https://generativelanguage.googleapis.com
API Key: GEMINI_API_KEY
Driver: Gemini-specific
Models:
  • gemini-2.5-pro - Frontier multimodal (1.25/1.25/10 per 1M tokens)
  • gemini-2.5-flash - Ultra-fast (0.15/0.15/0.6 per 1M tokens)
Features: 1M context, vision, code execution, grounding

AWS Bedrock

AWS Bedrock

Base URL: https://bedrock-runtime.us-east-1.amazonaws.com
API Key: AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
Driver: Bedrock SDK
Models:
  • bedrock-claude-sonnet - Claude Sonnet 4 via AWS
  • bedrock-nova-pro - Amazon Nova Pro (300K context)
  • bedrock-llama-3.3-70b - Llama 3.3 70B
Features: Enterprise compliance, VPC isolation, AWS integration

DeepSeek

DeepSeek

Base URL: https://api.deepseek.com/v1
API Key: DEEPSEEK_API_KEY
Driver: OpenAI-compatible
Models:
  • deepseek-chat - Balanced performance (0.14/0.14/0.28 per 1M tokens)
  • deepseek-reasoner - R1 reasoning model (0.55/0.55/2.19 per 1M tokens)
Features: 128K context, tool use, competitive pricing

Groq

Groq

Base URL: https://api.groq.com/openai/v1
API Key: GROQ_API_KEY
Driver: OpenAI-compatible
Models:
  • llama-3.3-70b - Ultra-fast Llama inference (0.59/0.59/0.79 per 1M tokens)
Features: 131K context, 500+ tok/sec, low latency

Mistral AI

Mistral AI

Base URL: https://api.mistral.ai/v1
API Key: MISTRAL_API_KEY
Driver: OpenAI-compatible
Models:
  • mistral-large - Flagship European model (2/2/6 per 1M tokens)
Features: 128K context, tool use, European data residency

Together AI

Together AI

Base URL: https://api.together.xyz/v1
API Key: TOGETHER_API_KEY
Driver: OpenAI-compatible
Models:
  • together-llama-3.3-70b - Open-source Llama (0.88/0.88/0.88 per 1M tokens)
Features: Open models, custom fine-tuning, fast inference

Fireworks AI

Fireworks AI

Base URL: https://api.fireworks.ai/inference/v1
API Key: FIREWORKS_API_KEY
Driver: OpenAI-compatible
Models:
  • fireworks-llama-3.3-70b - Fast Llama hosting (0.9/0.9/0.9 per 1M tokens)
Features: Sub-second latency, function calling

Cohere

Cohere

Base URL: https://api.cohere.ai/v1
API Key: COHERE_API_KEY
Driver: OpenAI-compatible
Models:
  • command-a - Latest flagship (2.5/2.5/10 per 1M tokens)
  • command-r-plus - RAG-optimized (3/3/15 per 1M tokens)
  • command-r - Balanced (0.5/0.5/1.5 per 1M tokens)
Features: 256K context, RAG, grounded generation

Perplexity

Perplexity AI

Base URL: https://api.perplexity.ai
API Key: PERPLEXITY_API_KEY
Driver: OpenAI-compatible
Models:
  • sonar-pro - Search-augmented answers (3/3/15 per 1M tokens)
  • sonar - Balanced search (1/1/1 per 1M tokens)
Features: Real-time search, citations, 200K context

xAI

xAI Grok

Base URL: https://api.x.ai/v1
API Key: XAI_API_KEY
Driver: OpenAI-compatible
Models:
  • grok-3 - Frontier reasoning (3/3/15 per 1M tokens)
  • grok-2 - Smart general purpose (2/2/10 per 1M tokens)
  • grok-3-mini - Fast responses (0.3/0.3/0.5 per 1M tokens)
Features: 131K context, tool use, X integration

Replicate

Replicate

Base URL: https://api.replicate.com/v1
API Key: REPLICATE_API_TOKEN
Driver: OpenAI-compatible
Models:
  • replicate-llama-3.3-70b - Llama 3.3 70B Instruct
Features: Run any open model, custom deployments

Ollama (Local)

Ollama

Base URL: http://localhost:11434/v1
API Key: Not required
Driver: OpenAI-compatible
Models: Any model from ollama.ai/library
  • llama3.3, qwen2.5, deepseek-r1, mistral, phi4, etc.
Features: Fully local, no API costs, privacy-first, offline

vLLM (Local)

vLLM

Base URL: http://localhost:8000/v1
API Key: Not required
Driver: OpenAI-compatible
Features: Production self-hosting, GPU optimization, PagedAttention

LM Studio (Local)

LM Studio

Base URL: http://localhost:1234/v1
API Key: Not required
Driver: OpenAI-compatible
Features: Desktop GUI, one-click setup, model library

OpenRouter

OpenRouter

Base URL: https://openrouter.ai/api/v1
API Key: OPENROUTER_API_KEY
Driver: OpenAI-compatible
Models:
  • openrouter-auto - Automatic routing across 100+ models
Features: Unified API, cost optimization, fallback routing

HuggingFace

HuggingFace

Base URL: https://api-inference.huggingface.co
API Key: HF_API_KEY
Driver: OpenAI-compatible
Models:
  • hf-llama-3.3-70b - Llama 3.3 70B Instruct
  • hf-mistral-7b - Mistral 7B (free tier)
Features: Free tier, 1000+ models, serverless inference

AI21 Labs

AI21 Labs

Base URL: https://api.ai21.com/studio/v1
API Key: AI21_API_KEY
Driver: OpenAI-compatible
Models:
  • jamba-1.5-large - 256K context (2/2/8 per 1M tokens)
  • jamba-1.5-mini - Fast variant (0.2/0.2/0.4 per 1M tokens)
Features: 256K context, structured outputs

Cerebras

Cerebras

Base URL: https://api.cerebras.ai/v1
API Key: CEREBRAS_API_KEY
Driver: OpenAI-compatible
Models:
  • cerebras-llama-3.3-70b - Ultra-fast Llama (0.6/0.6/0.6 per 1M tokens)
Features: 1800+ tok/sec, wafer-scale engine

SambaNova

SambaNova

Base URL: https://api.sambanova.ai/v1
API Key: SAMBANOVA_API_KEY
Driver: OpenAI-compatible
Models:
  • samba-llama-3.1-405b - Largest Llama (5/5/10 per 1M tokens)
  • samba-llama-3.3-70b - Balanced (0.6/0.6/0.6 per 1M tokens)
Features: Enterprise hardware, tool use

Qwen (Alibaba)

Qwen

Base URL: https://dashscope.aliyuncs.com/compatible-mode/v1
API Key: DASHSCOPE_API_KEY
Driver: OpenAI-compatible
Models:
  • qwen-max - Flagship model (2.4/2.4/9.6 per 1M tokens)
  • qwen-plus - Balanced (0.5/0.5/1.5 per 1M tokens)
  • qwen-turbo - Fast, 1M context (0.05/0.05/0.15 per 1M tokens)
Features: 1M context, multilingual, code generation

MiniMax

MiniMax

Base URL: https://api.minimax.chat/v1
API Key: MINIMAX_API_KEY
Driver: OpenAI-compatible
Models:
  • abab7-chat - ABAB 7 (1/1/1 per 1M tokens)
Features: 245K context, Chinese language

Zhipu AI

Zhipu AI (GLM)

Base URL: https://open.bigmodel.cn/api/paas/v4
API Key: ZHIPU_API_KEY
Driver: OpenAI-compatible
Models:
  • glm-4-plus - Advanced (7/7/7 per 1M tokens)
  • glm-4 - Balanced (1.4/1.4/1.4 per 1M tokens)
Features: 128K context, Chinese/English bilingual

Moonshot (Kimi)

Moonshot

Base URL: https://api.moonshot.cn/v1
API Key: MOONSHOT_API_KEY
Driver: OpenAI-compatible
Models:
  • moonshot-v1-128k - 128K context (8.5/8.5/8.5 per 1M tokens)
  • moonshot-v1-32k - 32K context (3.3/3.3/3.3 per 1M tokens)
Features: Long context, Chinese language

Baidu Qianfan

Baidu ERNIE

Base URL: https://aip.baidubce.com/rpc/2.0
API Key: QIANFAN_API_KEY
Driver: OpenAI-compatible
Models:
  • ernie-4.0-turbo - Advanced (4.2/4.2/8.4 per 1M tokens)
  • ernie-3.5-turbo - Balanced (0.56/0.56/1.12 per 1M tokens)
Features: 128K context, Chinese language, Baidu ecosystem

GitHub Copilot

GitHub Copilot

Base URL: https://api.githubcopilot.com
API Key: GITHUB_TOKEN
Driver: OpenAI-compatible
Models:
  • copilot-gpt-4o - GPT-4o via Copilot subscription
Features: Included with Copilot, code-optimized

Provider Selection

Choose providers based on:
  • Cost: Local (free) → Fast inference → Cloud APIs → Frontier labs
  • Latency: Groq, Cerebras → vLLM → Standard APIs
  • Privacy: Ollama, vLLM, LM Studio (100% local)
  • Compliance: AWS Bedrock (SOC2, HIPAA, FedRAMP)
  • Language: Chinese models for Chinese content
  • Features: Tool use, vision, long context

Testing Providers

# Test provider reachability
agentos models providers

# Check API key configuration
echo $ANTHROPIC_API_KEY

# Test with CLI
agentos message default "Hello" --model claude-haiku-4-5

Next Steps

Model Catalog

Browse all 47 models with pricing

Routing Logic

Learn complexity-based selection

Build docs developers (and LLMs) love