Model configuration

Models are defined under the top-level models key and referenced by name from agents. You can also reference models inline on an agent using provider/model-name format — no models section required.

Model reference formats

Inline
Named

Reference a model directly on the agent using provider/model-name:

agents:
  root:
    model: openai/gpt-4o

Define the model in the models section to set parameters and reuse it:

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

agents:
  root:
    model: claude

Full schema

models:
  my_model:
    provider: string           # required
    model: string              # required
    temperature: 0.7           # optional
    max_tokens: 4096           # optional
    top_p: 1.0                 # optional
    frequency_penalty: 0.0     # optional
    presence_penalty: 0.0      # optional
    base_url: string           # optional
    token_key: string          # optional
    thinking_budget: string|int # optional
    parallel_tool_calls: true  # optional
    track_usage: false         # optional
    routing: []                # optional
    provider_opts: {}          # optional

Fields

provider

string

required

The model provider. Supported values: openai, anthropic, google, amazon-bedrock, dmr, mistral, xai, nebius, minimax. Can also be the name of a custom provider defined in the providers section.

model

string

required

The model identifier as understood by the provider (e.g., gpt-4o, claude-sonnet-4-0, gemini-2.5-flash).

temperature

number

Sampling temperature. 0.0 is deterministic, 1.0 is more creative. Range: 0.0–1.0.

max_tokens

number

Maximum number of tokens in the model’s response. Consult provider documentation for model-specific limits.

top_p

number

Nucleus sampling threshold. Only tokens comprising the top top_p probability mass are considered. Range: 0.0–1.0.

frequency_penalty

number

Penalize tokens that have already appeared in the response, reducing repetition. Range: 0.0–2.0.

presence_penalty

number

Encourage the model to introduce new topics by penalizing tokens that have appeared at all. Range: 0.0–2.0.

base_url

string

Custom API endpoint URL. Useful for self-hosted models, proxies, or Azure OpenAI deployments.

token_key

string

Environment variable name containing the API token. Overrides the provider’s default key lookup.

thinking_budget

string | number

Reasoning effort control. Accepts a string effort level (none, low, medium, high, adaptive) or an integer token budget. Provider-specific — see Thinking budget below.

parallel_tool_calls

boolean

Allow the model to call multiple tools simultaneously in a single turn. Supported by most OpenAI and Anthropic models.

track_usage

boolean

default:"false"

Track and report token usage for this model.

routing

array

Rule-based routing rules. When set, this model acts as a router that forwards requests to different models based on message content. See Routing.

provider_opts

object

Provider-specific options passed directly to the provider. See individual provider pages for available options.

Examples by provider

models:
  # OpenAI
  gpt:
    provider: openai
    model: gpt-4o

  # Anthropic
  claude:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

  # Google Gemini
  gemini:
    provider: google
    model: gemini-2.5-flash
    temperature: 0.5

  # AWS Bedrock
  bedrock:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    provider_opts:
      region: us-east-1

  # Docker Model Runner (local)
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192

Thinking budget

Control how much reasoning the model does before responding. Configuration varies by provider.

OpenAI — effort levels

Uses string effort levels:

models:
  gpt:
    provider: openai
    model: o4-mini
    thinking_budget: medium  # none | low | medium | high

Anthropic — token budget

Uses an integer token budget (1024–32768). Must be less than max_tokens:

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    max_tokens: 64000
    thinking_budget: 16384

Google Gemini 2.5 — token budget

Uses an integer token budget. 0 disables thinking, -1 lets the model decide dynamically:

models:
  gemini:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: -1  # -1 = dynamic (default)

Google Gemini 3 — effort levels

Uses string effort levels like OpenAI:

models:
  gemini3:
    provider: google
    model: gemini-3-flash
    thinking_budget: medium  # none | low | medium | high

To disable thinking on any provider:

thinking_budget: none  # or 0

Interleaved thinking

For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during the model’s reasoning process. This is enabled by default.

models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    provider_opts:
      interleaved_thinking: false  # disable if needed

Custom endpoints

Use base_url and token_key to point to custom or self-hosted endpoints:

models:
  # Azure OpenAI
  azure_gpt:
    provider: openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
    token_key: AZURE_OPENAI_API_KEY

  # Self-hosted vLLM (OpenAI-compatible)
  local_llama:
    provider: openai
    model: meta-llama/Llama-3.2-3B-Instruct
    base_url: http://localhost:8000/v1

  # Internal proxy
  proxied:
    provider: openai
    model: gpt-4o
    base_url: https://proxy.internal.company.com/openai/v1
    token_key: INTERNAL_API_KEY

For reusable provider configurations, use the top-level providers section instead of repeating base_url and token_key on every model. See Configuration overview. See Local models and Custom providers for more details.

Get Started

Core Concepts

Features

Configuration

Built-in Tools

Model Providers

Guides

Community

Model reference formats

Full schema

Fields

Examples by provider

Thinking budget

Interleaved thinking

Custom endpoints

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Configuration

Built-in Tools

Model Providers

Guides

Community

​Model reference formats

​Full schema

​Fields

​Examples by provider

​Thinking budget

​Interleaved thinking

​Custom endpoints

Build docs developers (and LLMs) love

Model reference formats

Full schema

Fields

Examples by provider

Thinking budget

Interleaved thinking

Custom endpoints