Backends - AXON

AXON supports multiple LLM backends, each optimized for specific providers. Backends handle the compilation of AXON IR into provider-native prompt structures and manage API communication.

Overview

AXON includes four production-ready backends:

Backend	Provider	Models	Default Model
anthropic	Anthropic	Claude 3, 3.5	`claude-3-5-sonnet-20241022`
openai	OpenAI	GPT-4, GPT-4 Turbo, GPT-4o	`gpt-4-turbo-preview`
gemini	Google	Gemini 1.5	`gemini-1.5-pro`
ollama	Local	Llama, Mistral, custom	`llama3.1`

Selection

Command-Line

Specify the backend using the --backend or -b flag:

axon run program.axon --backend anthropic
axon compile program.axon -b openai

Default: anthropic (if not specified)

Supported Commands

Backends affect:

axon compile — Optimizes IR for target backend
axon run — Executes using specified backend

The axon check command is backend-agnostic (no compilation occurs).

Anthropic Backend

Overview

Provider: Anthropic (Claude) Backend Class: AnthropicBackend Default Model: claude-3-5-sonnet-20241022

Configuration

Required Environment Variable:

export ANTHROPIC_API_KEY="sk-ant-api03-..."

Get your API key: console.anthropic.com

Usage

# Compile for Claude
axon compile program.axon --backend anthropic

# Execute with Claude
axon run program.axon --backend anthropic

Features

Extended Context

feature

Optimized for Claude’s 200K+ token context windows. AXON IR is compiled to leverage full context for complex flows.

System Prompts

feature

Persona definitions are fused into Claude’s system prompt for consistent cognitive identity throughout execution.

Chain-of-Thought

feature

AXON reason blocks map directly to Claude’s native chain-of-thought capabilities with explicit step markers.

Constitutional AI

feature

Anchor constraints are integrated with Claude’s safety mechanisms for aligned outputs.

Model Selection

Available Models:

claude-3-5-sonnet-20241022 (default) — Best balance of speed and capability
claude-3-opus-20240229 — Maximum capability, slower, more expensive
claude-3-haiku-20240307 — Fast and cost-effective

Configure in backend (requires code modification or future config support):

from axon.backends import AnthropicBackend
backend = AnthropicBackend(model="claude-3-opus-20240229")

Pricing (March 2026)

Model	Input	Output
Claude 3.5 Sonnet	$3/MTok	$15/MTok
Claude 3 Opus	$15/MTok	$75/MTok
Claude 3 Haiku	$0.25/MTok	$1.25/MTok

Free Tier: $5 credit for new accounts

Best For

Complex reasoning tasks
Multi-step flows
Large context requirements
Safety-critical applications
Research and analysis

OpenAI Backend

Overview

Provider: OpenAI (GPT) Backend Class: OpenAIBackend Default Model: gpt-4-turbo-preview or gpt-4o

Configuration

Required Environment Variable:

export OPENAI_API_KEY="sk-proj-..."

Get your API key: platform.openai.com

Usage

# Compile for GPT
axon compile program.axon --backend openai

# Execute with GPT
axon run program.axon --backend openai

Features

Structured Outputs

feature

AXON type declarations compile to OpenAI’s JSON schema format for guaranteed structured outputs.

Function Calling

feature

Tool declarations map to OpenAI’s function calling API for reliable tool invocation.

JSON Mode

feature

AXON flows with structured return types automatically enable JSON mode for validated outputs.

GPT-4 Vision

feature

Future support for multimodal inputs in AXON programs (planned).

Model Selection

Available Models:

gpt-4-turbo-preview — Large context, faster than GPT-4
gpt-4o — Multimodal, balanced performance
gpt-4 — Original, most tested

Pricing (March 2026)

Model	Input	Output
GPT-4 Turbo	$10/MTok	$30/MTok
GPT-4o	$5/MTok	$15/MTok
GPT-4	$30/MTok	$60/MTok

Free Tier: $5 credit (expires after 3 months)

Best For

Structured data extraction
Tool-heavy workflows
JSON output requirements
Function calling scenarios
Cost-sensitive deployments (GPT-4o)

Gemini Backend

Overview

Provider: Google (Gemini) Backend Class: GeminiBackend Default Model: gemini-1.5-pro

Configuration

Required Environment Variable:

export API_KEY_GEMINI="AIza..."

Get your API key: aistudio.google.com

Usage

# Compile for Gemini
axon compile program.axon --backend gemini

# Execute with Gemini
axon run program.axon --backend gemini

Features

Multimodal Inputs

feature

Native support for text, images, audio, and video in AXON programs (roadmap).

Grounding

feature

Integration with Google Search for factual grounding of outputs.

Safety Settings

feature

Anchor constraints map to Gemini’s built-in safety categories and thresholds.

Long Context

feature

Gemini 1.5 Pro supports up to 2M tokens — ideal for entire codebases or document collections.

Model Selection

Available Models:

gemini-1.5-pro — Massive context, multimodal
gemini-1.5-flash — Faster, cost-effective

Pricing (March 2026)

Model	Input	Output
Gemini 1.5 Pro	$3.50/MTok	$10.50/MTok
Gemini 1.5 Flash	$0.35/MTok	$1.05/MTok

Free Tier: 15 requests/minute with generous token limits

Best For

Multimodal applications
Extremely long context (1M+ tokens)
Factual accuracy (with grounding)
Cost-sensitive high-volume workloads (Flash)
Integration with Google ecosystem

Ollama Backend

Overview

Provider: Ollama (Local) Backend Class: OllamaBackend Default Model: llama3.1

Configuration

Required Setup:

Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Start Ollama server:
```
ollama serve
```
Pull a model:
```
ollama pull llama3.1
```

No API key required — runs entirely locally.

Usage

# Compile for Ollama
axon compile program.axon --backend ollama

# Execute with Ollama
axon run program.axon --backend ollama

Features

Local Execution

feature

No data leaves your machine — perfect for privacy-sensitive applications.

Custom Models

feature

Use any Ollama-compatible model, including fine-tuned variants.

No API Costs

feature

Unlimited execution with no per-request charges.

Offline Capable

feature

Run AXON programs without internet connectivity.

Model Selection

Popular Models:

llama3.1 (8B, 70B, 405B) — Meta’s latest open model
mistral — Fast and efficient
codellama — Optimized for code
phi-2 — Compact but capable

Pull models:

ollama pull llama3.1:8b
ollama pull mistral
ollama pull codellama:13b

List installed models:

ollama list

Hardware Requirements

Model Size	RAM Required	GPU Recommended
7B	8GB	Optional
13B	16GB	Yes
70B	32GB+	Yes (24GB+ VRAM)
405B	256GB+	Multiple GPUs

Pricing

Free — Only hardware costs (electricity, depreciation)

Best For

Privacy-critical applications
Offline environments
High-volume execution (avoid API costs)
Custom model fine-tuning
Research and experimentation
Air-gapped deployments

Backend Comparison

Feature Matrix

Feature	Anthropic	OpenAI	Gemini	Ollama
Context Length	200K	128K	2M	Varies
Structured Output	Partial	✓	Partial	Varies
Function Calling	✓	✓	✓	Varies
Multimodal	✗	✓ (GPT-4o)	✓	Varies
Grounding	✗	✗	✓	✗
Offline	✗	✗	✗	✓
Free Tier	$5 credit	$5 credit	15 req/min	✓ Unlimited

Performance Comparison

Latency (typical request):

Anthropic: 2-5 seconds
OpenAI: 1-4 seconds
Gemini: 2-6 seconds
Ollama: 0.5-10 seconds (hardware-dependent)

Throughput (requests/minute):

Anthropic: ~50-100 (tier-dependent)
OpenAI: ~90-500 (tier-dependent)
Gemini: 15-300 (tier-dependent)
Ollama: Unlimited (hardware-limited)

Cost Comparison

Example: 1M input tokens + 500K output tokens

Backend	Cost
Anthropic (Sonnet)	$10.50
OpenAI (GPT-4 Turbo)	$25.00
OpenAI (GPT-4o)	$12.50
Gemini (1.5 Pro)	$8.75
Gemini (1.5 Flash)	$0.875
Ollama	$0

Backend Selection Guide

Choose Anthropic When:

✓ Need strong reasoning capabilities
✓ Complex multi-step flows
✓ Safety is paramount
✓ Large context windows
✓ Research-oriented tasks

Choose OpenAI When:

✓ Structured outputs required
✓ Heavy tool usage
✓ JSON schema validation
✓ Mature ecosystem integrations
✓ Function calling workflows

Choose Gemini When:

✓ Extremely long context (>200K tokens)
✓ Multimodal inputs
✓ Need factual grounding
✓ Cost optimization (Flash)
✓ Google Cloud integration

Choose Ollama When:

✓ Privacy requirements
✓ Offline operation
✓ High-volume cost optimization
✓ Custom model fine-tuning
✓ Rapid iteration (no API limits)

Examples

Multi-Backend Testing

#!/bin/bash
# Test program across all backends

for backend in anthropic openai gemini ollama; do
  echo "Testing $backend..."
  axon run program.axon --backend $backend --trace
  mv program.trace.json traces/program.$backend.trace.json
done

echo "Compare results:"
for file in traces/*.trace.json; do
  echo "$file:"
  cat "$file" | jq '._meta.duration_ms'
done

Backend-Specific Compilation

# Compile for multiple backends
for backend in anthropic openai gemini; do
  axon compile program.axon \
    --backend $backend \
    --output dist/program.$backend.ir.json
done

Fallback Strategy

#!/bin/bash
# Try backends in order of preference

if [ -n "$ANTHROPIC_API_KEY" ]; then
  BACKEND="anthropic"
elif [ -n "$OPENAI_API_KEY" ]; then
  BACKEND="openai"
elif [ -n "$API_KEY_GEMINI" ]; then
  BACKEND="gemini"
else
  BACKEND="ollama"
fi

echo "Using backend: $BACKEND"
axon run program.axon --backend $BACKEND

Troubleshooting

Backend Not Found

✗ Backend error: Unknown backend 'invalid'

Solution: Use a supported backend name:

axon run program.axon --backend anthropic  # ✓ Valid

API Key Not Set

✗ Backend error: ANTHROPIC_API_KEY not set

Solution: Set the required environment variable:

export ANTHROPIC_API_KEY="sk-ant-api03-..."

See API Keys for details.

Ollama Connection Failed

✗ Backend error: Could not connect to Ollama at localhost:11434

Solution: Ensure Ollama is running:

ollama serve

Model Not Available

Ollama:

ollama pull llama3.1

Other Providers: Model selection is handled by backend implementation (may require code changes).

API Keys

Configure backend authentication

run Command

Execute with backends

Commands

Configuration

​Overview

​Selection

​Command-Line

​Supported Commands

​Anthropic Backend

​Overview

​Configuration

​Usage

​Features

​Model Selection

​Pricing (March 2026)

​Best For

​OpenAI Backend

​Overview

​Configuration

​Usage

​Features

​Model Selection

​Pricing (March 2026)

​Best For

​Gemini Backend

​Overview

​Configuration

​Usage

​Features

​Model Selection

​Pricing (March 2026)

​Best For

​Ollama Backend

​Overview

​Configuration

​Usage

​Features

​Model Selection

​Hardware Requirements

​Pricing

​Best For

​Backend Comparison

​Feature Matrix

​Performance Comparison

​Cost Comparison

​Backend Selection Guide

​Choose Anthropic When:

​Choose OpenAI When:

​Choose Gemini When:

​Choose Ollama When:

​Examples

​Multi-Backend Testing

​Backend-Specific Compilation

​Fallback Strategy

​Troubleshooting

​Backend Not Found

​API Key Not Set

​Ollama Connection Failed

​Model Not Available

​Related Documentation

API Keys

run Command

Build docs developers (and LLMs) love

Overview

Selection

Command-Line

Supported Commands

Anthropic Backend

Overview

Configuration

Usage

Features

Model Selection

Pricing (March 2026)

Best For

OpenAI Backend

Overview

Configuration

Usage

Features

Model Selection

Pricing (March 2026)

Best For

Gemini Backend

Overview

Configuration

Usage

Features

Model Selection

Pricing (March 2026)

Best For

Ollama Backend

Overview

Configuration

Usage

Features

Model Selection

Hardware Requirements

Pricing

Best For

Backend Comparison

Feature Matrix

Performance Comparison

Cost Comparison

Backend Selection Guide

Choose Anthropic When:

Choose OpenAI When:

Choose Gemini When:

Choose Ollama When:

Examples

Multi-Backend Testing

Backend-Specific Compilation

Fallback Strategy

Troubleshooting

Backend Not Found

API Key Not Set

Ollama Connection Failed

Model Not Available

Related Documentation