Routing Configuration

The routing configuration determines which LLM model each process type uses. You can set defaults in [defaults.routing] and override per agent in [agents.routing].

Process-Type Routing

channel

string

default:"anthropic/claude-sonnet-4"

Model used by channel processes (user-facing conversations).

[defaults.routing]
channel = "anthropic/claude-sonnet-4"

branch

string

default:"anthropic/claude-sonnet-4"

Model used by branch processes (forked thinking).

[defaults.routing]
branch = "anthropic/claude-sonnet-4"

worker

string

default:"anthropic/claude-sonnet-4"

Model used by worker processes (task execution).

[defaults.routing]
worker = "anthropic/claude-haiku-4.5"

compactor

string

default:"anthropic/claude-sonnet-4"

Model used by compactor processes (context summarization).

[defaults.routing]
compactor = "anthropic/claude-haiku-4.5"

cortex

string

default:"anthropic/claude-sonnet-4"

Model used by cortex processes (system observation, bulletin generation).

[defaults.routing]
cortex = "anthropic/claude-haiku-4.5"

voice

string

Model used for voice processing (if enabled).

[defaults.routing]
voice = "openai/gpt-4.1-mini"

Task-Type Overrides

task_overrides

object

Task-specific model overrides for workers and branches. When a worker or branch is spawned with a specific task type, use the override model instead of the default.

[defaults.routing.task_overrides]
coding = "anthropic/claude-sonnet-4"
research = "openai/gpt-4.1"
memory = "anthropic/claude-haiku-4.5"

Thinking Effort

channel_thinking_effort

string

default:"auto"

Thinking effort level for channel models. Controls extended thinking tokens.Valid values: auto, low, medium, high

[defaults.routing]
channel_thinking_effort = "medium"

branch_thinking_effort

string

default:"auto"

Thinking effort level for branch models.

[defaults.routing]
branch_thinking_effort = "high"

worker_thinking_effort

string

default:"auto"

Thinking effort level for worker models.

[defaults.routing]
worker_thinking_effort = "medium"

compactor_thinking_effort

string

default:"auto"

Thinking effort level for compactor models.

[defaults.routing]
compactor_thinking_effort = "low"

cortex_thinking_effort

string

default:"auto"

Thinking effort level for cortex models.

[defaults.routing]
cortex_thinking_effort = "medium"

Fallback Chains

fallbacks

object

Fallback chains for resilience. When a model fails with a retriable error (429, 502, 503, 504, rate limit), try the next model in its chain.

[defaults.routing.fallbacks]
"anthropic/claude-sonnet-4" = ["anthropic/claude-haiku-4.5", "openai/gpt-4.1"]
"openai/gpt-4.1" = ["openai/gpt-4.1-mini"]

rate_limit_cooldown_secs

integer

default:"60"

How long to deprioritize a rate-limited model (seconds). After a rate limit error, the model is avoided for this duration.

[defaults.routing]
rate_limit_cooldown_secs = 120

Model Name Format

Model names follow the format provider/model-id. The provider must match a configured LLM provider or custom provider.

Built-in Providers

anthropic/ - Anthropic models (e.g., anthropic/claude-sonnet-4)
openai/ - OpenAI models (e.g., openai/gpt-4.1)
openrouter/ - OpenRouter models (e.g., openrouter/anthropic/claude-sonnet-4-20250514)
kilo/ - Kilo Gateway models (e.g., kilo/anthropic/claude-sonnet-4.5)
gemini/ - Google Gemini models (e.g., gemini/gemini-2.0-flash-exp)
groq/ - Groq models (e.g., groq/llama-3.3-70b-versatile)
deepseek/ - DeepSeek models (e.g., deepseek/deepseek-chat)
xai/ - xAI models (e.g., xai/grok-2-latest)
mistral/ - Mistral AI models (e.g., mistral/mistral-large-latest)
together/ - Together AI models
fireworks/ - Fireworks AI models
nvidia/ - NVIDIA NIM models
ollama/ - Ollama models (e.g., ollama/llama3.3)

Custom Providers

For custom providers defined in [llm.provider.<name>], use <name>/model-id:

[llm.provider.my_proxy]
api_type = "openai_completions"
base_url = "https://my-proxy.example.com/v1"
api_key = "env:MY_PROXY_KEY"

[defaults.routing]
channel = "my_proxy/gpt-4"

Examples

Single Provider (Anthropic)

[llm]
anthropic_key = "env:ANTHROPIC_API_KEY"

[defaults.routing]
channel = "anthropic/claude-sonnet-4"
branch = "anthropic/claude-sonnet-4"
worker = "anthropic/claude-haiku-4.5"
compactor = "anthropic/claude-haiku-4.5"
cortex = "anthropic/claude-haiku-4.5"

Mixed Providers with Task Overrides

[llm]
anthropic_key = "env:ANTHROPIC_API_KEY"
openai_key = "env:OPENAI_API_KEY"

[defaults.routing]
channel = "anthropic/claude-sonnet-4"
branch = "anthropic/claude-sonnet-4"
worker = "anthropic/claude-haiku-4.5"
compactor = "openai/gpt-4.1-mini"
cortex = "openai/gpt-4.1-mini"

[defaults.routing.task_overrides]
coding = "anthropic/claude-sonnet-4"
research = "openai/gpt-4.1"

OpenRouter with Fallbacks

[llm]
openrouter_key = "env:OPENROUTER_API_KEY"

[defaults.routing]
channel = "openrouter/anthropic/claude-sonnet-4-20250514"
branch = "openrouter/anthropic/claude-sonnet-4-20250514"
worker = "openrouter/anthropic/claude-haiku-4.5-20250514"
compactor = "openrouter/anthropic/claude-haiku-4.5-20250514"
cortex = "openrouter/anthropic/claude-haiku-4.5-20250514"

[defaults.routing.fallbacks]
"openrouter/anthropic/claude-sonnet-4-20250514" = [
  "openrouter/anthropic/claude-haiku-4.5-20250514",
  "openrouter/google/gemini-2.0-flash-exp"
]

Per-Agent Routing Override

[defaults.routing]
channel = "anthropic/claude-haiku-4.5"
worker = "anthropic/claude-haiku-4.5"

# Default agent uses fast, cheap models
[[agents]]
id = "support"
default = true

# Coding agent uses more powerful models
[[agents]]
id = "coding"

[agents.routing]
channel = "anthropic/claude-sonnet-4"
worker = "anthropic/claude-sonnet-4"

[agents.routing.task_overrides]
coding = "anthropic/claude-sonnet-4"

Thinking Effort Configuration

[defaults.routing]
channel = "anthropic/claude-sonnet-4"
branch = "anthropic/claude-sonnet-4"
worker = "anthropic/claude-sonnet-4"

# Use extended thinking for complex tasks
channel_thinking_effort = "medium"
branch_thinking_effort = "high"
worker_thinking_effort = "medium"

# Keep compaction and cortex lightweight
compactor_thinking_effort = "low"
cortex_thinking_effort = "low"

Retriable Errors

The following errors trigger fallback to the next model in the chain:

HTTP 429 (rate limit)
HTTP 502 (bad gateway)
HTTP 503 (service unavailable)
HTTP 504 (gateway timeout)
Connection errors
Timeouts
Empty responses
Malformed responses

Context overflow errors (token limit exceeded) do NOT trigger fallbacks - they trigger compaction instead.

CLI Commands

Configuration

Tools

Routing Configuration

Process-Type Routing

Task-Type Overrides

Thinking Effort

Fallback Chains

Model Name Format

Built-in Providers

Custom Providers

Examples

Single Provider (Anthropic)

Mixed Providers with Task Overrides

OpenRouter with Fallbacks

Per-Agent Routing Override

Thinking Effort Configuration

Retriable Errors

Build docs developers (and LLMs) love

CLI Commands

Configuration

Tools

​Process-Type Routing

​Task-Type Overrides

​Thinking Effort

​Fallback Chains

​Model Name Format

​Built-in Providers

​Custom Providers

​Examples

​Single Provider (Anthropic)

​Mixed Providers with Task Overrides

​OpenRouter with Fallbacks

​Per-Agent Routing Override

​Thinking Effort Configuration

​Retriable Errors

Build docs developers (and LLMs) love

Process-Type Routing

Task-Type Overrides

Thinking Effort

Fallback Chains

Model Name Format

Built-in Providers

Custom Providers

Examples

Single Provider (Anthropic)

Mixed Providers with Task Overrides

OpenRouter with Fallbacks

Per-Agent Routing Override

Thinking Effort Configuration

Retriable Errors