models key and referenced by name from agents. You can also reference models inline on an agent using provider/model-name format — no models section required.
Model reference formats
- Inline
- Named
Reference a model directly on the agent using
provider/model-name:Full schema
Fields
The model provider. Supported values:
openai, anthropic, google, amazon-bedrock, dmr, mistral, xai, nebius, minimax. Can also be the name of a custom provider defined in the providers section.The model identifier as understood by the provider (e.g.,
gpt-4o, claude-sonnet-4-0, gemini-2.5-flash).Sampling temperature.
0.0 is deterministic, 1.0 is more creative. Range: 0.0–1.0.Maximum number of tokens in the model’s response. Consult provider documentation for model-specific limits.
Nucleus sampling threshold. Only tokens comprising the top
top_p probability mass are considered. Range: 0.0–1.0.Penalize tokens that have already appeared in the response, reducing repetition. Range:
0.0–2.0.Encourage the model to introduce new topics by penalizing tokens that have appeared at all. Range:
0.0–2.0.Custom API endpoint URL. Useful for self-hosted models, proxies, or Azure OpenAI deployments.
Environment variable name containing the API token. Overrides the provider’s default key lookup.
Reasoning effort control. Accepts a string effort level (
none, low, medium, high, adaptive) or an integer token budget. Provider-specific — see Thinking budget below.Allow the model to call multiple tools simultaneously in a single turn. Supported by most OpenAI and Anthropic models.
Track and report token usage for this model.
Rule-based routing rules. When set, this model acts as a router that forwards requests to different models based on message content. See Routing.
Provider-specific options passed directly to the provider. See individual provider pages for available options.
Examples by provider
Thinking budget
Control how much reasoning the model does before responding. Configuration varies by provider.OpenAI — effort levels
OpenAI — effort levels
Uses string effort levels:
Anthropic — token budget
Anthropic — token budget
Uses an integer token budget (1024–32768). Must be less than
max_tokens:Google Gemini 2.5 — token budget
Google Gemini 2.5 — token budget
Uses an integer token budget.
0 disables thinking, -1 lets the model decide dynamically:Google Gemini 3 — effort levels
Google Gemini 3 — effort levels
Uses string effort levels like OpenAI:
Interleaved thinking
For Anthropic and Bedrock Claude models, interleaved thinking allows tool calls during the model’s reasoning process. This is enabled by default.Custom endpoints
Usebase_url and token_key to point to custom or self-hosted endpoints:
providers section instead of repeating base_url and token_key on every model. See Configuration overview.
See Local models and Custom providers for more details.