Skip to main content
AXON supports multiple LLM backends, each optimized for specific providers. Backends handle the compilation of AXON IR into provider-native prompt structures and manage API communication.

Overview

AXON includes four production-ready backends:
BackendProviderModelsDefault Model
anthropicAnthropicClaude 3, 3.5claude-3-5-sonnet-20241022
openaiOpenAIGPT-4, GPT-4 Turbo, GPT-4ogpt-4-turbo-preview
geminiGoogleGemini 1.5gemini-1.5-pro
ollamaLocalLlama, Mistral, customllama3.1

Selection

Command-Line

Specify the backend using the --backend or -b flag:
axon run program.axon --backend anthropic
axon compile program.axon -b openai
Default: anthropic (if not specified)

Supported Commands

Backends affect:
  • axon compile — Optimizes IR for target backend
  • axon run — Executes using specified backend
The axon check command is backend-agnostic (no compilation occurs).

Anthropic Backend

Overview

Provider: Anthropic (Claude) Backend Class: AnthropicBackend Default Model: claude-3-5-sonnet-20241022

Configuration

Required Environment Variable:
export ANTHROPIC_API_KEY="sk-ant-api03-..."
Get your API key: console.anthropic.com

Usage

# Compile for Claude
axon compile program.axon --backend anthropic

# Execute with Claude
axon run program.axon --backend anthropic

Features

Extended Context
feature
Optimized for Claude’s 200K+ token context windows. AXON IR is compiled to leverage full context for complex flows.
System Prompts
feature
Persona definitions are fused into Claude’s system prompt for consistent cognitive identity throughout execution.
Chain-of-Thought
feature
AXON reason blocks map directly to Claude’s native chain-of-thought capabilities with explicit step markers.
Constitutional AI
feature
Anchor constraints are integrated with Claude’s safety mechanisms for aligned outputs.

Model Selection

Available Models:
  • claude-3-5-sonnet-20241022 (default) — Best balance of speed and capability
  • claude-3-opus-20240229 — Maximum capability, slower, more expensive
  • claude-3-haiku-20240307 — Fast and cost-effective
Configure in backend (requires code modification or future config support):
from axon.backends import AnthropicBackend
backend = AnthropicBackend(model="claude-3-opus-20240229")

Pricing (March 2026)

ModelInputOutput
Claude 3.5 Sonnet$3/MTok$15/MTok
Claude 3 Opus$15/MTok$75/MTok
Claude 3 Haiku$0.25/MTok$1.25/MTok
Free Tier: $5 credit for new accounts

Best For

  • Complex reasoning tasks
  • Multi-step flows
  • Large context requirements
  • Safety-critical applications
  • Research and analysis

OpenAI Backend

Overview

Provider: OpenAI (GPT) Backend Class: OpenAIBackend Default Model: gpt-4-turbo-preview or gpt-4o

Configuration

Required Environment Variable:
export OPENAI_API_KEY="sk-proj-..."
Get your API key: platform.openai.com

Usage

# Compile for GPT
axon compile program.axon --backend openai

# Execute with GPT
axon run program.axon --backend openai

Features

Structured Outputs
feature
AXON type declarations compile to OpenAI’s JSON schema format for guaranteed structured outputs.
Function Calling
feature
Tool declarations map to OpenAI’s function calling API for reliable tool invocation.
JSON Mode
feature
AXON flows with structured return types automatically enable JSON mode for validated outputs.
GPT-4 Vision
feature
Future support for multimodal inputs in AXON programs (planned).

Model Selection

Available Models:
  • gpt-4-turbo-preview — Large context, faster than GPT-4
  • gpt-4o — Multimodal, balanced performance
  • gpt-4 — Original, most tested

Pricing (March 2026)

ModelInputOutput
GPT-4 Turbo$10/MTok$30/MTok
GPT-4o$5/MTok$15/MTok
GPT-4$30/MTok$60/MTok
Free Tier: $5 credit (expires after 3 months)

Best For

  • Structured data extraction
  • Tool-heavy workflows
  • JSON output requirements
  • Function calling scenarios
  • Cost-sensitive deployments (GPT-4o)

Gemini Backend

Overview

Provider: Google (Gemini) Backend Class: GeminiBackend Default Model: gemini-1.5-pro

Configuration

Required Environment Variable:
export API_KEY_GEMINI="AIza..."
Get your API key: aistudio.google.com

Usage

# Compile for Gemini
axon compile program.axon --backend gemini

# Execute with Gemini
axon run program.axon --backend gemini

Features

Multimodal Inputs
feature
Native support for text, images, audio, and video in AXON programs (roadmap).
Grounding
feature
Integration with Google Search for factual grounding of outputs.
Safety Settings
feature
Anchor constraints map to Gemini’s built-in safety categories and thresholds.
Long Context
feature
Gemini 1.5 Pro supports up to 2M tokens — ideal for entire codebases or document collections.

Model Selection

Available Models:
  • gemini-1.5-pro — Massive context, multimodal
  • gemini-1.5-flash — Faster, cost-effective

Pricing (March 2026)

ModelInputOutput
Gemini 1.5 Pro$3.50/MTok$10.50/MTok
Gemini 1.5 Flash$0.35/MTok$1.05/MTok
Free Tier: 15 requests/minute with generous token limits

Best For

  • Multimodal applications
  • Extremely long context (1M+ tokens)
  • Factual accuracy (with grounding)
  • Cost-sensitive high-volume workloads (Flash)
  • Integration with Google ecosystem

Ollama Backend

Overview

Provider: Ollama (Local) Backend Class: OllamaBackend Default Model: llama3.1

Configuration

Required Setup:
  1. Install Ollama:
    curl -fsSL https://ollama.com/install.sh | sh
    
  2. Start Ollama server:
    ollama serve
    
  3. Pull a model:
    ollama pull llama3.1
    
No API key required — runs entirely locally.

Usage

# Compile for Ollama
axon compile program.axon --backend ollama

# Execute with Ollama
axon run program.axon --backend ollama

Features

Local Execution
feature
No data leaves your machine — perfect for privacy-sensitive applications.
Custom Models
feature
Use any Ollama-compatible model, including fine-tuned variants.
No API Costs
feature
Unlimited execution with no per-request charges.
Offline Capable
feature
Run AXON programs without internet connectivity.

Model Selection

Popular Models:
  • llama3.1 (8B, 70B, 405B) — Meta’s latest open model
  • mistral — Fast and efficient
  • codellama — Optimized for code
  • phi-2 — Compact but capable
Pull models:
ollama pull llama3.1:8b
ollama pull mistral
ollama pull codellama:13b
List installed models:
ollama list

Hardware Requirements

Model SizeRAM RequiredGPU Recommended
7B8GBOptional
13B16GBYes
70B32GB+Yes (24GB+ VRAM)
405B256GB+Multiple GPUs

Pricing

Free — Only hardware costs (electricity, depreciation)

Best For

  • Privacy-critical applications
  • Offline environments
  • High-volume execution (avoid API costs)
  • Custom model fine-tuning
  • Research and experimentation
  • Air-gapped deployments

Backend Comparison

Feature Matrix

FeatureAnthropicOpenAIGeminiOllama
Context Length200K128K2MVaries
Structured OutputPartialPartialVaries
Function CallingVaries
Multimodal✓ (GPT-4o)Varies
Grounding
Offline
Free Tier$5 credit$5 credit15 req/min✓ Unlimited

Performance Comparison

Latency (typical request):
  • Anthropic: 2-5 seconds
  • OpenAI: 1-4 seconds
  • Gemini: 2-6 seconds
  • Ollama: 0.5-10 seconds (hardware-dependent)
Throughput (requests/minute):
  • Anthropic: ~50-100 (tier-dependent)
  • OpenAI: ~90-500 (tier-dependent)
  • Gemini: 15-300 (tier-dependent)
  • Ollama: Unlimited (hardware-limited)

Cost Comparison

Example: 1M input tokens + 500K output tokens
BackendCost
Anthropic (Sonnet)$10.50
OpenAI (GPT-4 Turbo)$25.00
OpenAI (GPT-4o)$12.50
Gemini (1.5 Pro)$8.75
Gemini (1.5 Flash)$0.875
Ollama$0

Backend Selection Guide

Choose Anthropic When:

  • ✓ Need strong reasoning capabilities
  • ✓ Complex multi-step flows
  • ✓ Safety is paramount
  • ✓ Large context windows
  • ✓ Research-oriented tasks

Choose OpenAI When:

  • ✓ Structured outputs required
  • ✓ Heavy tool usage
  • ✓ JSON schema validation
  • ✓ Mature ecosystem integrations
  • ✓ Function calling workflows

Choose Gemini When:

  • ✓ Extremely long context (>200K tokens)
  • ✓ Multimodal inputs
  • ✓ Need factual grounding
  • ✓ Cost optimization (Flash)
  • ✓ Google Cloud integration

Choose Ollama When:

  • ✓ Privacy requirements
  • ✓ Offline operation
  • ✓ High-volume cost optimization
  • ✓ Custom model fine-tuning
  • ✓ Rapid iteration (no API limits)

Examples

Multi-Backend Testing

#!/bin/bash
# Test program across all backends

for backend in anthropic openai gemini ollama; do
  echo "Testing $backend..."
  axon run program.axon --backend $backend --trace
  mv program.trace.json traces/program.$backend.trace.json
done

echo "Compare results:"
for file in traces/*.trace.json; do
  echo "$file:"
  cat "$file" | jq '._meta.duration_ms'
done

Backend-Specific Compilation

# Compile for multiple backends
for backend in anthropic openai gemini; do
  axon compile program.axon \
    --backend $backend \
    --output dist/program.$backend.ir.json
done

Fallback Strategy

#!/bin/bash
# Try backends in order of preference

if [ -n "$ANTHROPIC_API_KEY" ]; then
  BACKEND="anthropic"
elif [ -n "$OPENAI_API_KEY" ]; then
  BACKEND="openai"
elif [ -n "$API_KEY_GEMINI" ]; then
  BACKEND="gemini"
else
  BACKEND="ollama"
fi

echo "Using backend: $BACKEND"
axon run program.axon --backend $BACKEND

Troubleshooting

Backend Not Found

 Backend error: Unknown backend 'invalid'
Solution: Use a supported backend name:
axon run program.axon --backend anthropic  # ✓ Valid

API Key Not Set

 Backend error: ANTHROPIC_API_KEY not set
Solution: Set the required environment variable:
export ANTHROPIC_API_KEY="sk-ant-api03-..."
See API Keys for details.

Ollama Connection Failed

 Backend error: Could not connect to Ollama at localhost:11434
Solution: Ensure Ollama is running:
ollama serve

Model Not Available

Ollama:
ollama pull llama3.1
Other Providers: Model selection is handled by backend implementation (may require code changes).

API Keys

Configure backend authentication

run Command

Execute with backends

Build docs developers (and LLMs) love