Skip to main content
The RequestParams class configures how Fast Agent interacts with language models during generation.

Overview

from fast_agent import RequestParams

params = RequestParams(
    maxTokens=4096,
    temperature=0.7,
    max_iterations=15,
    parallel_tool_calls=True,
    use_history=True
)

response = await agent.generate(
    "What's the weather in San Francisco?",
    params=params
)

Core Parameters

maxTokens
int
default:"2048"
Maximum number of tokens to generate in the response.
model
str | None
default:"None"
Model identifier to use for this request. When specified, overrides the agent’s default model and modelPreferences.Format: "provider.model_name" or with reasoning: "provider.model?reasoning=high"Note: Can only be set during agent creation, not per-request.
temperature
float | None
default:"None"
Sampling temperature for response randomness. Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more focused and deterministic.Provider support varies.

Conversation Control

use_history
bool
default:"True"
Whether to maintain conversation history across turns. When True, the agent includes previous messages in the context.Note: Does not include applied prompts.
max_iterations
int
default:"10"
Maximum number of tool call iterations allowed in a single conversation turn. Prevents infinite loops from runaway tool usage.
parallel_tool_calls
bool
default:"True"
Whether to allow the model to request multiple tool calls simultaneously. When True, improves efficiency for independent operations.

Tool Execution

mcp_metadata
dict[str, Any] | None
default:"None"
Metadata dictionary to pass through to MCP tool calls via the _meta field. Useful for passing request context to tools.
params = RequestParams(
    mcp_metadata={
        "user_id": "12345",
        "session_id": "abc-def"
    }
)
emit_loop_progress
bool
default:"False"
Emit monotonic progress updates for the internal tool execution loop when supported by the provider.
tool_result_passthrough
bool
default:"False"
Skip post-tool LLM synthesis and return tool results directly as assistant output. Useful for debugging or when you want raw tool outputs.
streaming_timeout
float | None
default:"30.0"
Maximum time in seconds to wait for streaming completion. Set to None to disable timeout.

Structured Output

response_format
Any | None
default:"None"
Override response format for structured output calls.Prefer using Pydantic models instead of this parameter. Only use in exceptional circumstances where Pydantic models are not suitable.
from pydantic import BaseModel

class WeatherResponse(BaseModel):
    temperature: float
    condition: str

# Preferred approach
response = await agent.generate(
    "What's the weather?",
    output_schema=WeatherResponse
)

Advanced Parameters

Sampling Parameters

These parameters control the sampling behavior of the language model. Provider support varies.
top_p
float | None
default:"None"
Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered.Range: 0.0 to 1.0. Example: 0.9 means consider tokens making up the top 90% probability mass.Aliases: topP
top_k
int | None
default:"None"
Top-k sampling parameter. Only the top K most likely tokens are considered.Example: top_k=50 considers only the 50 most likely next tokens.Aliases: topK
min_p
float | None
default:"None"
Minimum probability threshold for sampling. Tokens with probability below this threshold are excluded.Range: 0.0 to 1.0.Aliases: minP

Penalty Parameters

These parameters discourage repetitive or common tokens. Provider support varies.
presence_penalty
float | None
default:"None"
Penalty for tokens that have already appeared in the conversation. Encourages topic diversity.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases: presencePenalty
frequency_penalty
float | None
default:"None"
Penalty proportional to how often a token has appeared. Stronger penalty for frequently repeated tokens.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases: frequencyPenalty
repetition_penalty
float | None
default:"None"
General repetition penalty. Alternative to presence/frequency penalties used by some providers.Aliases: repetitionPenalty

Service Tier

service_tier
'fast' | 'flex' | None
default:"None"
Service tier override for Responses-family models.
  • "fast": Priority processing with faster response times
  • "flex": Standard processing, may have variable latency

Template Variables

template_vars
dict[str, Any]
default:"{}"
Dictionary of template variables for dynamic prompt templates.Currently only supported by TensorZero inference backend.
params = RequestParams(
    template_vars={
        "user_name": "Alice",
        "max_results": 10
    }
)

Internal Parameters

tool_execution_handler
ToolExecutionHandler | None
default:"None"
Internal per-request tool execution handler. Not sent to LLM providers.For internal use only. Users should not need to set this parameter.

Usage Examples

Basic Generation

from fast_agent import Agent, RequestParams

agent = Agent()

params = RequestParams(
    maxTokens=1024,
    temperature=0.5
)

response = await agent.generate(
    "Explain quantum computing",
    params=params
)

Tool-Heavy Workload

params = RequestParams(
    max_iterations=20,  # Allow more tool calls
    parallel_tool_calls=True,  # Execute tools in parallel
    streaming_timeout=60.0  # Longer timeout for complex operations
)

response = await agent.generate(
    "Analyze all CSV files in /data and create a summary report",
    params=params
)

Controlled Randomness

params = RequestParams(
    temperature=0.2,  # Low temperature for focused output
    top_p=0.9,  # Nucleus sampling
    presence_penalty=0.6,  # Discourage repetition
    frequency_penalty=0.3
)

response = await agent.generate(
    "Write a technical document about API design",
    params=params
)

Debug Mode

params = RequestParams(
    tool_result_passthrough=True,  # Get raw tool outputs
    use_history=False  # Don't include conversation history
)

response = await agent.generate(
    "List files in /tmp",
    params=params
)

With MCP Metadata

params = RequestParams(
    mcp_metadata={
        "request_id": "req_abc123",
        "user_context": {"role": "admin", "permissions": ["read", "write"]}
    }
)

response = await agent.generate(
    "Update the configuration file",
    params=params
)

Parameter Validation

Fast Agent validates request parameters and provides helpful error messages:
# This will raise a validation error
params = RequestParams(
    maxTokens=-1  # Invalid: must be positive
)

# This will raise a validation error
params = RequestParams(
    temperature=2.5  # Invalid: typically 0.0-2.0
)

Combining with Agent Configuration

Request parameters override agent-level defaults:
agent = Agent(
    default_model="gpt-5-mini",
    temperature=0.7
)

# This request uses temperature=0.2 instead of agent's 0.7
response = await agent.generate(
    "Be very precise",
    params=RequestParams(temperature=0.2)
)

Build docs developers (and LLMs) love