RequestParams class configures how Fast Agent interacts with language models during generation.
Overview
Core Parameters
Maximum number of tokens to generate in the response.
Model identifier to use for this request. When specified, overrides the agent’s default model and
modelPreferences.Format: "provider.model_name" or with reasoning: "provider.model?reasoning=high"Note: Can only be set during agent creation, not per-request.Sampling temperature for response randomness. Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more focused and deterministic.Provider support varies.
Conversation Control
Whether to maintain conversation history across turns. When
True, the agent includes previous messages in the context.Note: Does not include applied prompts.Maximum number of tool call iterations allowed in a single conversation turn. Prevents infinite loops from runaway tool usage.
Whether to allow the model to request multiple tool calls simultaneously. When
True, improves efficiency for independent operations.Tool Execution
Metadata dictionary to pass through to MCP tool calls via the
_meta field. Useful for passing request context to tools.Emit monotonic progress updates for the internal tool execution loop when supported by the provider.
Skip post-tool LLM synthesis and return tool results directly as assistant output. Useful for debugging or when you want raw tool outputs.
Maximum time in seconds to wait for streaming completion. Set to
None to disable timeout.Structured Output
Override response format for structured output calls.Prefer using Pydantic models instead of this parameter. Only use in exceptional circumstances where Pydantic models are not suitable.
Advanced Parameters
Sampling Parameters
These parameters control the sampling behavior of the language model. Provider support varies.Nucleus sampling parameter. Only tokens with cumulative probability up to
top_p are considered.Range: 0.0 to 1.0. Example: 0.9 means consider tokens making up the top 90% probability mass.Aliases: topPTop-k sampling parameter. Only the top K most likely tokens are considered.Example:
top_k=50 considers only the 50 most likely next tokens.Aliases: topKMinimum probability threshold for sampling. Tokens with probability below this threshold are excluded.Range: 0.0 to 1.0.Aliases:
minPPenalty Parameters
These parameters discourage repetitive or common tokens. Provider support varies.Penalty for tokens that have already appeared in the conversation. Encourages topic diversity.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases:
presencePenaltyPenalty proportional to how often a token has appeared. Stronger penalty for frequently repeated tokens.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases:
frequencyPenaltyGeneral repetition penalty. Alternative to presence/frequency penalties used by some providers.Aliases:
repetitionPenaltyService Tier
Service tier override for Responses-family models.
"fast": Priority processing with faster response times"flex": Standard processing, may have variable latency
Template Variables
Dictionary of template variables for dynamic prompt templates.Currently only supported by TensorZero inference backend.
Internal Parameters
Internal per-request tool execution handler. Not sent to LLM providers.For internal use only. Users should not need to set this parameter.
