Quick Setup
The fastest path from zero to running:Provider Reference
Native Drivers
- Anthropic
- Google Gemini
- OpenAI
Set to
anthropicEnvironment variable holding the API key
API endpoint (optional override)
claude-opus-4-20250514(Frontier) — 200K context, 75 per million tokensclaude-sonnet-4-20250514(Smart) — 200K context, 15 per million tokensclaude-haiku-4-5-20251001(Fast) — 200K context, 1.25 per million tokens
- Sign up at console.anthropic.com
- Create an API key under Settings > API Keys
export ANTHROPIC_API_KEY="sk-ant-..."
Cloud Providers (OpenAI-Compatible)
DeepSeek
DeepSeek
deepseek
DEEPSEEK_API_KEY
deepseek-chat(Smart) — DeepSeek V3, 1.10 per million tokensdeepseek-reasoner(Smart) — DeepSeek R1, no tool support, 2.19 per million tokens
Groq
Groq
groq
GROQ_API_KEY
llama-3.3-70b-versatile(Balanced) — 128K context, 0.079 per million tokensmixtral-8x7b-32768(Balanced) — 32K context, 0.024 per million tokensllama-3.1-8b-instant(Fast) — 128K context, 0.08 per million tokensgemma2-9b-it(Fast) — 8K context, 0.02 per million tokens
Groq runs open-source models on custom LPU hardware. Extremely fast inference.
Cerebras
Cerebras
cerebras
CEREBRAS_API_KEY
cerebras/llama3.3-70b(Balanced) — 128K context, 0.06 per million tokenscerebras/llama3.1-8b(Fast) — 128K context, 0.01 per million tokens
Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap.
OpenRouter
OpenRouter
openrouter
OPENROUTER_API_KEY
openrouter/auto(Smart) — auto-selects best model, 3 per million tokensopenrouter/optimus(Balanced) — cost-optimized, 1.50 per million tokensopenrouter/nitro(Fast) — speed-optimized, 0.60 per million tokens
OpenRouter is a unified gateway to 200+ models. The three built-in entries are smart-routing endpoints. You can use any OpenRouter model ID directly.
Mistral AI
Mistral AI
More Providers
More Providers
OpenFang supports 27 total providers:
- Together AI —
TOGETHER_API_KEY - Fireworks AI —
FIREWORKS_API_KEY - Perplexity AI —
PERPLEXITY_API_KEY(built-in web search) - Cohere —
COHERE_API_KEY - AI21 Labs —
AI21_API_KEY - SambaNova —
SAMBANOVA_API_KEY - Hugging Face —
HF_API_KEY - xAI (Grok) —
XAI_API_KEY - Replicate —
REPLICATE_API_TOKEN
Local Providers (No API Key)
- Ollama
- vLLM
- LM Studio
ollama
Ollama server endpoint
Not required
llama3.2(Local) — 128K context, freemistral:latest(Local) — 32K context, freephi3(Local) — 128K context, free
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3.2 - Start the server:
ollama serve - No env var needed — Ollama is always available
OpenFang auto-discovers models from a running Ollama instance and merges them into the catalog with
Local tier and zero cost. Any model you pull becomes usable immediately.Configuration in config.toml
Per-Agent Model Overrides
Each agent can specify its own model in its manifest:When
pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. Used in Stabilization mode for production reliability.Model Aliases
All 23 aliases are case-insensitive and resolve to canonical model IDs:| Alias | Resolves To |
|---|---|
sonnet | claude-sonnet-4-20250514 |
haiku | claude-haiku-4-5-20251001 |
opus | claude-opus-4-20250514 |
gpt4 | gpt-4o |
flash | gemini-2.5-flash |
gemini-pro | gemini-2.5-pro |
deepseek | deepseek-chat |
llama | llama-3.3-70b-versatile |
mistral | mistral-large-latest |
codestral | codestral-latest |
grok | grok-2 |
sonar | sonar-pro |
jamba | jamba-1.5-large |
Model Catalog Overview
Model Tiers
| Tier | Description | Typical Use |
|---|---|---|
| Frontier | Most capable, highest cost | Orchestration, architecture, security audits |
| Smart | Strong reasoning, moderate cost | Coding, code review, research, analysis |
| Balanced | Good cost/quality tradeoff | Planning, writing, DevOps, day-to-day tasks |
| Fast | Cheapest cloud inference | Ops, translation, simple Q&A, health checks |
| Local | Self-hosted, zero cost | Privacy-first, offline, development |
Cost Tracking
OpenFang tracks the cost of every LLM call:Controls usage info appended to responses:
off— No usage information showntokens— Show token counts onlycost— Show estimated cost onlyfull— Show both token counts and estimated cost (default)
Environment Variables
Anthropic API key (Claude models)
Google Gemini API key (alias:
GOOGLE_API_KEY)OpenAI API key
Groq API key (fast Llama inference)
DeepSeek API key
OpenRouter API key
Together AI API key
Mistral AI API key
Fireworks AI API key
Perplexity API key (also used for web search)
Cohere API key
AI21 Labs API key
Cerebras API key
SambaNova API key
Hugging Face Inference API key
xAI (Grok) API key
Replicate API token
Security Notes
- All API keys are stored as
Zeroizing<String>— key material is automatically overwritten with zeros when dropped from memory - Auth detection only checks for env var presence — never reads or logs the actual secret value
- Provider API keys set via REST API follow the same zeroization policy
- All config structs implement
Debugwith secret redaction — API keys are printed as"***"in logs
