Skip to main content
Models are defined under the top-level models key and referenced by name from agents. You can also reference models inline on an agent using provider/model-name format — no models section required.

Model reference formats

Reference a model directly on the agent using provider/model-name:
agents:
  root:
    model: openai/gpt-4o

Full schema

models:
  my_model:
    provider: string           # required
    model: string              # required
    temperature: 0.7           # optional
    max_tokens: 4096           # optional
    top_p: 1.0                 # optional
    frequency_penalty: 0.0     # optional
    presence_penalty: 0.0      # optional
    base_url: string           # optional
    token_key: string          # optional
    thinking_budget: string|int # optional
    parallel_tool_calls: true  # optional
    track_usage: false         # optional
    routing: []                # optional
    provider_opts: {}          # optional

Fields

provider
string
required
The model provider. Supported values: openai, anthropic, google, amazon-bedrock, dmr, mistral, xai, nebius, minimax. Can also be the name of a custom provider defined in the providers section.
model
string
required
The model identifier as understood by the provider (e.g., gpt-4o, claude-sonnet-4-0, gemini-2.5-flash).
temperature
number
Sampling temperature. 0.0 is deterministic, 1.0 is more creative. Range: 0.01.0.
max_tokens
number
Maximum number of tokens in the model’s response. Consult provider documentation for model-specific limits.
top_p
number
Nucleus sampling threshold. Only tokens comprising the top top_p probability mass are considered. Range: 0.01.0.
frequency_penalty
number
Penalize tokens that have already appeared in the response, reducing repetition. Range: 0.02.0.
presence_penalty
number
Encourage the model to introduce new topics by penalizing tokens that have appeared at all. Range: 0.02.0.
base_url
string
Custom API endpoint URL. Useful for self-hosted models, proxies, or Azure OpenAI deployments.
token_key
string
Environment variable name containing the API token. Overrides the provider’s default key lookup.
thinking_budget
string | number
Reasoning effort control. Accepts a string effort level (none, low, medium, high, adaptive) or an integer token budget. Provider-specific — see Thinking budget below.
parallel_tool_calls
boolean
Allow the model to call multiple tools simultaneously in a single turn. Supported by most OpenAI and Anthropic models.
track_usage
boolean
default:"false"
Track and report token usage for this model.
routing
array
Rule-based routing rules. When set, this model acts as a router that forwards requests to different models based on message content. See Routing.
provider_opts
object
Provider-specific options passed directly to the provider. See individual provider pages for available options.

Examples by provider

models:
  # OpenAI
  gpt:
    provider: openai
    model: gpt-4o

  # Anthropic
  claude:
    provider: anthropic
    model: claude-sonnet-4-0
    max_tokens: 64000

  # Google Gemini
  gemini:
    provider: google
    model: gemini-2.5-flash
    temperature: 0.5

  # AWS Bedrock
  bedrock:
    provider: amazon-bedrock
    model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
    provider_opts:
      region: us-east-1

  # Docker Model Runner (local)
  local:
    provider: dmr
    model: ai/qwen3
    max_tokens: 8192

Thinking budget

Control how much reasoning the model does before responding. Configuration varies by provider.
Uses string effort levels:
models:
  gpt:
    provider: openai
    model: o4-mini
    thinking_budget: medium  # none | low | medium | high
Uses an integer token budget (1024–32768). Must be less than max_tokens:
models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    max_tokens: 64000
    thinking_budget: 16384
Uses an integer token budget. 0 disables thinking, -1 lets the model decide dynamically:
models:
  gemini:
    provider: google
    model: gemini-2.5-flash
    thinking_budget: -1  # -1 = dynamic (default)
Uses string effort levels like OpenAI:
models:
  gemini3:
    provider: google
    model: gemini-3-flash
    thinking_budget: medium  # none | low | medium | high
To disable thinking on any provider:
thinking_budget: none  # or 0

Interleaved thinking

For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during the model’s reasoning process. This is enabled by default.
models:
  claude:
    provider: anthropic
    model: claude-sonnet-4-5
    provider_opts:
      interleaved_thinking: false  # disable if needed

Custom endpoints

Use base_url and token_key to point to custom or self-hosted endpoints:
models:
  # Azure OpenAI
  azure_gpt:
    provider: openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com/openai/deployments/gpt-4o
    token_key: AZURE_OPENAI_API_KEY

  # Self-hosted vLLM (OpenAI-compatible)
  local_llama:
    provider: openai
    model: meta-llama/Llama-3.2-3B-Instruct
    base_url: http://localhost:8000/v1

  # Internal proxy
  proxied:
    provider: openai
    model: gpt-4o
    base_url: https://proxy.internal.company.com/openai/v1
    token_key: INTERNAL_API_KEY
For reusable provider configurations, use the top-level providers section instead of repeating base_url and token_key on every model. See Configuration overview. See Local models and Custom providers for more details.

Build docs developers (and LLMs) love