Overview
AXON includes four production-ready backends:| Backend | Provider | Models | Default Model |
|---|---|---|---|
| anthropic | Anthropic | Claude 3, 3.5 | claude-3-5-sonnet-20241022 |
| openai | OpenAI | GPT-4, GPT-4 Turbo, GPT-4o | gpt-4-turbo-preview |
| gemini | Gemini 1.5 | gemini-1.5-pro | |
| ollama | Local | Llama, Mistral, custom | llama3.1 |
Selection
Command-Line
Specify the backend using the--backend or -b flag:
anthropic (if not specified)
Supported Commands
Backends affect:axon compile— Optimizes IR for target backendaxon run— Executes using specified backend
The
axon check command is backend-agnostic (no compilation occurs).Anthropic Backend
Overview
Provider: Anthropic (Claude) Backend Class:AnthropicBackend
Default Model: claude-3-5-sonnet-20241022
Configuration
Required Environment Variable:Usage
Features
Optimized for Claude’s 200K+ token context windows. AXON IR is compiled to leverage full context for complex flows.
Persona definitions are fused into Claude’s system prompt for consistent cognitive identity throughout execution.
AXON
reason blocks map directly to Claude’s native chain-of-thought capabilities with explicit step markers.Anchor constraints are integrated with Claude’s safety mechanisms for aligned outputs.
Model Selection
Available Models:claude-3-5-sonnet-20241022(default) — Best balance of speed and capabilityclaude-3-opus-20240229— Maximum capability, slower, more expensiveclaude-3-haiku-20240307— Fast and cost-effective
Pricing (March 2026)
| Model | Input | Output |
|---|---|---|
| Claude 3.5 Sonnet | $3/MTok | $15/MTok |
| Claude 3 Opus | $15/MTok | $75/MTok |
| Claude 3 Haiku | $0.25/MTok | $1.25/MTok |
Best For
- Complex reasoning tasks
- Multi-step flows
- Large context requirements
- Safety-critical applications
- Research and analysis
OpenAI Backend
Overview
Provider: OpenAI (GPT) Backend Class:OpenAIBackend
Default Model: gpt-4-turbo-preview or gpt-4o
Configuration
Required Environment Variable:Usage
Features
AXON type declarations compile to OpenAI’s JSON schema format for guaranteed structured outputs.
Tool declarations map to OpenAI’s function calling API for reliable tool invocation.
AXON flows with structured return types automatically enable JSON mode for validated outputs.
Future support for multimodal inputs in AXON programs (planned).
Model Selection
Available Models:gpt-4-turbo-preview— Large context, faster than GPT-4gpt-4o— Multimodal, balanced performancegpt-4— Original, most tested
Pricing (March 2026)
| Model | Input | Output |
|---|---|---|
| GPT-4 Turbo | $10/MTok | $30/MTok |
| GPT-4o | $5/MTok | $15/MTok |
| GPT-4 | $30/MTok | $60/MTok |
Best For
- Structured data extraction
- Tool-heavy workflows
- JSON output requirements
- Function calling scenarios
- Cost-sensitive deployments (GPT-4o)
Gemini Backend
Overview
Provider: Google (Gemini) Backend Class:GeminiBackend
Default Model: gemini-1.5-pro
Configuration
Required Environment Variable:Usage
Features
Native support for text, images, audio, and video in AXON programs (roadmap).
Integration with Google Search for factual grounding of outputs.
Anchor constraints map to Gemini’s built-in safety categories and thresholds.
Gemini 1.5 Pro supports up to 2M tokens — ideal for entire codebases or document collections.
Model Selection
Available Models:gemini-1.5-pro— Massive context, multimodalgemini-1.5-flash— Faster, cost-effective
Pricing (March 2026)
| Model | Input | Output |
|---|---|---|
| Gemini 1.5 Pro | $3.50/MTok | $10.50/MTok |
| Gemini 1.5 Flash | $0.35/MTok | $1.05/MTok |
Best For
- Multimodal applications
- Extremely long context (1M+ tokens)
- Factual accuracy (with grounding)
- Cost-sensitive high-volume workloads (Flash)
- Integration with Google ecosystem
Ollama Backend
Overview
Provider: Ollama (Local) Backend Class:OllamaBackend
Default Model: llama3.1
Configuration
Required Setup:-
Install Ollama:
-
Start Ollama server:
-
Pull a model:
Usage
Features
No data leaves your machine — perfect for privacy-sensitive applications.
Use any Ollama-compatible model, including fine-tuned variants.
Unlimited execution with no per-request charges.
Run AXON programs without internet connectivity.
Model Selection
Popular Models:llama3.1(8B, 70B, 405B) — Meta’s latest open modelmistral— Fast and efficientcodellama— Optimized for codephi-2— Compact but capable
Hardware Requirements
| Model Size | RAM Required | GPU Recommended |
|---|---|---|
| 7B | 8GB | Optional |
| 13B | 16GB | Yes |
| 70B | 32GB+ | Yes (24GB+ VRAM) |
| 405B | 256GB+ | Multiple GPUs |
Pricing
Free — Only hardware costs (electricity, depreciation)Best For
- Privacy-critical applications
- Offline environments
- High-volume execution (avoid API costs)
- Custom model fine-tuning
- Research and experimentation
- Air-gapped deployments
Backend Comparison
Feature Matrix
| Feature | Anthropic | OpenAI | Gemini | Ollama |
|---|---|---|---|---|
| Context Length | 200K | 128K | 2M | Varies |
| Structured Output | Partial | ✓ | Partial | Varies |
| Function Calling | ✓ | ✓ | ✓ | Varies |
| Multimodal | ✗ | ✓ (GPT-4o) | ✓ | Varies |
| Grounding | ✗ | ✗ | ✓ | ✗ |
| Offline | ✗ | ✗ | ✗ | ✓ |
| Free Tier | $5 credit | $5 credit | 15 req/min | ✓ Unlimited |
Performance Comparison
Latency (typical request):- Anthropic: 2-5 seconds
- OpenAI: 1-4 seconds
- Gemini: 2-6 seconds
- Ollama: 0.5-10 seconds (hardware-dependent)
- Anthropic: ~50-100 (tier-dependent)
- OpenAI: ~90-500 (tier-dependent)
- Gemini: 15-300 (tier-dependent)
- Ollama: Unlimited (hardware-limited)
Cost Comparison
Example: 1M input tokens + 500K output tokens| Backend | Cost |
|---|---|
| Anthropic (Sonnet) | $10.50 |
| OpenAI (GPT-4 Turbo) | $25.00 |
| OpenAI (GPT-4o) | $12.50 |
| Gemini (1.5 Pro) | $8.75 |
| Gemini (1.5 Flash) | $0.875 |
| Ollama | $0 |
Backend Selection Guide
Choose Anthropic When:
- ✓ Need strong reasoning capabilities
- ✓ Complex multi-step flows
- ✓ Safety is paramount
- ✓ Large context windows
- ✓ Research-oriented tasks
Choose OpenAI When:
- ✓ Structured outputs required
- ✓ Heavy tool usage
- ✓ JSON schema validation
- ✓ Mature ecosystem integrations
- ✓ Function calling workflows
Choose Gemini When:
- ✓ Extremely long context (>200K tokens)
- ✓ Multimodal inputs
- ✓ Need factual grounding
- ✓ Cost optimization (Flash)
- ✓ Google Cloud integration
Choose Ollama When:
- ✓ Privacy requirements
- ✓ Offline operation
- ✓ High-volume cost optimization
- ✓ Custom model fine-tuning
- ✓ Rapid iteration (no API limits)
Examples
Multi-Backend Testing
Backend-Specific Compilation
Fallback Strategy
Troubleshooting
Backend Not Found
API Key Not Set
Ollama Connection Failed
Model Not Available
Ollama:Related Documentation
API Keys
Configure backend authentication
run Command
Execute with backends
