Skip to main content
OpenFang ships with a comprehensive model catalog covering 27 supported providers, 51 built-in models, and 23 aliases. Every provider uses one of three battle-tested drivers: the native Anthropic driver, the native Gemini driver, or the universal OpenAI-compatible driver.

Quick Setup

The fastest path from zero to running:
# Pick ONE provider — set its env var — done.
export GEMINI_API_KEY="your-key"        # Free tier available
# OR
export GROQ_API_KEY="your-key"          # Free tier available
# OR
export ANTHROPIC_API_KEY="your-key"
# OR
export OPENAI_API_KEY="your-key"
OpenFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.

Provider Reference

Native Drivers

provider
string
required
Set to anthropic
api_key_env
string
default:"ANTHROPIC_API_KEY"
Environment variable holding the API key
base_url
string
default:"https://api.anthropic.com"
API endpoint (optional override)
Available Models:
  • claude-opus-4-20250514 (Frontier) — 200K context, 15/15/75 per million tokens
  • claude-sonnet-4-20250514 (Smart) — 200K context, 3/3/15 per million tokens
  • claude-haiku-4-5-20251001 (Fast) — 200K context, 0.25/0.25/1.25 per million tokens
Setup:
  1. Sign up at console.anthropic.com
  2. Create an API key under Settings > API Keys
  3. export ANTHROPIC_API_KEY="sk-ant-..."

Cloud Providers (OpenAI-Compatible)

provider
string
deepseek
api_key_env
string
DEEPSEEK_API_KEY
Models:
  • deepseek-chat (Smart) — DeepSeek V3, 0.27/0.27/1.10 per million tokens
  • deepseek-reasoner (Smart) — DeepSeek R1, no tool support, 0.55/0.55/2.19 per million tokens
Setup: platform.deepseek.com
provider
string
groq
api_key_env
string
GROQ_API_KEY
Models:
  • llama-3.3-70b-versatile (Balanced) — 128K context, 0.059/0.059/0.079 per million tokens
  • mixtral-8x7b-32768 (Balanced) — 32K context, 0.024/0.024/0.024 per million tokens
  • llama-3.1-8b-instant (Fast) — 128K context, 0.05/0.05/0.08 per million tokens
  • gemma2-9b-it (Fast) — 8K context, 0.02/0.02/0.02 per million tokens
Free Tier: Yes (rate-limited)Setup: console.groq.com
Groq runs open-source models on custom LPU hardware. Extremely fast inference.
provider
string
cerebras
api_key_env
string
CEREBRAS_API_KEY
Models:
  • cerebras/llama3.3-70b (Balanced) — 128K context, 0.06/0.06/0.06 per million tokens
  • cerebras/llama3.1-8b (Fast) — 128K context, 0.01/0.01/0.01 per million tokens
Free Tier: Yes (generous)Setup: cloud.cerebras.ai
Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap.
provider
string
openrouter
api_key_env
string
OPENROUTER_API_KEY
Models:
  • openrouter/auto (Smart) — auto-selects best model, 1/1/3 per million tokens
  • openrouter/optimus (Balanced) — cost-optimized, 0.50/0.50/1.50 per million tokens
  • openrouter/nitro (Fast) — speed-optimized, 0.20/0.20/0.60 per million tokens
Setup: openrouter.ai
OpenRouter is a unified gateway to 200+ models. The three built-in entries are smart-routing endpoints. You can use any OpenRouter model ID directly.
provider
string
mistral
api_key_env
string
MISTRAL_API_KEY
Models:
  • mistral-large-latest (Smart) — 2/2/6 per million tokens
  • codestral-latest (Smart) — 0.30/0.30/0.90 per million tokens
  • mistral-small-latest (Fast) — 0.10/0.10/0.30 per million tokens
OpenFang supports 27 total providers:
  • Together AITOGETHER_API_KEY
  • Fireworks AIFIREWORKS_API_KEY
  • Perplexity AIPERPLEXITY_API_KEY (built-in web search)
  • CohereCOHERE_API_KEY
  • AI21 LabsAI21_API_KEY
  • SambaNovaSAMBANOVA_API_KEY
  • Hugging FaceHF_API_KEY
  • xAI (Grok)XAI_API_KEY
  • ReplicateREPLICATE_API_TOKEN
See the full provider reference in the source docs.

Local Providers (No API Key)

provider
string
ollama
base_url
string
default:"http://localhost:11434/v1"
Ollama server endpoint
api_key_env
string
Not required
Available Models (builtin):
  • llama3.2 (Local) — 128K context, free
  • mistral:latest (Local) — 32K context, free
  • phi3 (Local) — 128K context, free
Setup:
  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start the server: ollama serve
  4. No env var needed — Ollama is always available
OpenFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.

Configuration in config.toml

[default_model]
provider = "anthropic"
model = "claude-sonnet-4-20250514"
api_key_env = "ANTHROPIC_API_KEY"
# base_url = "https://api.anthropic.com"  # Optional override

Per-Agent Model Overrides

Each agent can specify its own model in its manifest:
# Global default
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override using alias
[[agents]]
name = "orchestrator"
model = "opus"  # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"  # cheap Groq model

[[agents]]
name = "coder"
model = "gemini-2.5-flash"  # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"  # Perplexity with built-in web search
When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. Used in Stabilization mode for production reliability.

Model Aliases

All 23 aliases are case-insensitive and resolve to canonical model IDs:
AliasResolves To
sonnetclaude-sonnet-4-20250514
haikuclaude-haiku-4-5-20251001
opusclaude-opus-4-20250514
gpt4gpt-4o
flashgemini-2.5-flash
gemini-progemini-2.5-pro
deepseekdeepseek-chat
llamallama-3.3-70b-versatile
mistralmistral-large-latest
codestralcodestral-latest
grokgrok-2
sonarsonar-pro
jambajamba-1.5-large

Model Catalog Overview

Model Tiers

TierDescriptionTypical Use
FrontierMost capable, highest costOrchestration, architecture, security audits
SmartStrong reasoning, moderate costCoding, code review, research, analysis
BalancedGood cost/quality tradeoffPlanning, writing, DevOps, day-to-day tasks
FastCheapest cloud inferenceOps, translation, simple Q&A, health checks
LocalSelf-hosted, zero costPrivacy-first, offline, development

Cost Tracking

OpenFang tracks the cost of every LLM call:
cost = (input_tokens / 1,000,000) × input_rate + (output_tokens / 1,000,000) × output_rate
The usage footer (when enabled) appends cost information to each response:
> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514
Controls usage info appended to responses:
  • off — No usage information shown
  • tokens — Show token counts only
  • cost — Show estimated cost only
  • full — Show both token counts and estimated cost (default)

Environment Variables

ANTHROPIC_API_KEY
string
Anthropic API key (Claude models)
GEMINI_API_KEY
string
Google Gemini API key (alias: GOOGLE_API_KEY)
OPENAI_API_KEY
string
OpenAI API key
GROQ_API_KEY
string
Groq API key (fast Llama inference)
DEEPSEEK_API_KEY
string
DeepSeek API key
OPENROUTER_API_KEY
string
OpenRouter API key
TOGETHER_API_KEY
string
Together AI API key
MISTRAL_API_KEY
string
Mistral AI API key
FIREWORKS_API_KEY
string
Fireworks AI API key
PERPLEXITY_API_KEY
string
Perplexity API key (also used for web search)
COHERE_API_KEY
string
Cohere API key
AI21_API_KEY
string
AI21 Labs API key
CEREBRAS_API_KEY
string
Cerebras API key
SAMBANOVA_API_KEY
string
SambaNova API key
HF_API_KEY
string
Hugging Face Inference API key
XAI_API_KEY
string
xAI (Grok) API key
REPLICATE_API_TOKEN
string
Replicate API token

Security Notes

  • All API keys are stored as Zeroizing<String> — key material is automatically overwritten with zeros when dropped from memory
  • Auth detection only checks for env var presence — never reads or logs the actual secret value
  • Provider API keys set via REST API follow the same zeroization policy
  • All config structs implement Debug with secret redaction — API keys are printed as "***" in logs

Build docs developers (and LLMs) love