Request Parameters

The RequestParams class configures how Fast Agent interacts with language models during generation.

Overview

from fast_agent import RequestParams

params = RequestParams(
    maxTokens=4096,
    temperature=0.7,
    max_iterations=15,
    parallel_tool_calls=True,
    use_history=True
)

response = await agent.generate(
    "What's the weather in San Francisco?",
    params=params
)

Core Parameters

maxTokens

int

default:"2048"

Maximum number of tokens to generate in the response.

model

str | None

default:"None"

Model identifier to use for this request. When specified, overrides the agent’s default model and modelPreferences.Format: "provider.model_name" or with reasoning: "provider.model?reasoning=high"Note: Can only be set during agent creation, not per-request.

temperature

float | None

default:"None"

Sampling temperature for response randomness. Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more focused and deterministic.Provider support varies.

Conversation Control

use_history

bool

default:"True"

Whether to maintain conversation history across turns. When True, the agent includes previous messages in the context.Note: Does not include applied prompts.

max_iterations

int

default:"10"

Maximum number of tool call iterations allowed in a single conversation turn. Prevents infinite loops from runaway tool usage.

parallel_tool_calls

bool

default:"True"

Whether to allow the model to request multiple tool calls simultaneously. When True, improves efficiency for independent operations.

Tool Execution

mcp_metadata

dict[str, Any] | None

default:"None"

Metadata dictionary to pass through to MCP tool calls via the _meta field. Useful for passing request context to tools.

params = RequestParams(
    mcp_metadata={
        "user_id": "12345",
        "session_id": "abc-def"
    }
)

emit_loop_progress

bool

default:"False"

Emit monotonic progress updates for the internal tool execution loop when supported by the provider.

tool_result_passthrough

bool

default:"False"

Skip post-tool LLM synthesis and return tool results directly as assistant output. Useful for debugging or when you want raw tool outputs.

streaming_timeout

float | None

default:"30.0"

Maximum time in seconds to wait for streaming completion. Set to None to disable timeout.

Structured Output

response_format

Any | None

default:"None"

Override response format for structured output calls.Prefer using Pydantic models instead of this parameter. Only use in exceptional circumstances where Pydantic models are not suitable.

from pydantic import BaseModel

class WeatherResponse(BaseModel):
    temperature: float
    condition: str

# Preferred approach
response = await agent.generate(
    "What's the weather?",
    output_schema=WeatherResponse
)

Advanced Parameters

Sampling Parameters

These parameters control the sampling behavior of the language model. Provider support varies.

top_p

float | None

default:"None"

Nucleus sampling parameter. Only tokens with cumulative probability up to top_p are considered.Range: 0.0 to 1.0. Example: 0.9 means consider tokens making up the top 90% probability mass.Aliases: topP

top_k

int | None

default:"None"

Top-k sampling parameter. Only the top K most likely tokens are considered.Example: top_k=50 considers only the 50 most likely next tokens.Aliases: topK

min_p

float | None

default:"None"

Minimum probability threshold for sampling. Tokens with probability below this threshold are excluded.Range: 0.0 to 1.0.Aliases: minP

Penalty Parameters

These parameters discourage repetitive or common tokens. Provider support varies.

presence_penalty

float | None

default:"None"

Penalty for tokens that have already appeared in the conversation. Encourages topic diversity.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases: presencePenalty

frequency_penalty

float | None

default:"None"

Penalty proportional to how often a token has appeared. Stronger penalty for frequently repeated tokens.Range: typically -2.0 to 2.0. Positive values discourage repetition.Aliases: frequencyPenalty

repetition_penalty

float | None

default:"None"

General repetition penalty. Alternative to presence/frequency penalties used by some providers.Aliases: repetitionPenalty

Service Tier

service_tier

'fast' | 'flex' | None

default:"None"

Service tier override for Responses-family models.

"fast": Priority processing with faster response times
"flex": Standard processing, may have variable latency

Template Variables

template_vars

dict[str, Any]

default:"{}"

Dictionary of template variables for dynamic prompt templates.Currently only supported by TensorZero inference backend.

params = RequestParams(
    template_vars={
        "user_name": "Alice",
        "max_results": 10
    }
)

Internal Parameters

tool_execution_handler

ToolExecutionHandler | None

default:"None"

Internal per-request tool execution handler. Not sent to LLM providers.For internal use only. Users should not need to set this parameter.

Usage Examples

Basic Generation

from fast_agent import Agent, RequestParams

agent = Agent()

params = RequestParams(
    maxTokens=1024,
    temperature=0.5
)

response = await agent.generate(
    "Explain quantum computing",
    params=params
)

Tool-Heavy Workload

params = RequestParams(
    max_iterations=20,  # Allow more tool calls
    parallel_tool_calls=True,  # Execute tools in parallel
    streaming_timeout=60.0  # Longer timeout for complex operations
)

response = await agent.generate(
    "Analyze all CSV files in /data and create a summary report",
    params=params
)

Controlled Randomness

params = RequestParams(
    temperature=0.2,  # Low temperature for focused output
    top_p=0.9,  # Nucleus sampling
    presence_penalty=0.6,  # Discourage repetition
    frequency_penalty=0.3
)

response = await agent.generate(
    "Write a technical document about API design",
    params=params
)

Debug Mode

params = RequestParams(
    tool_result_passthrough=True,  # Get raw tool outputs
    use_history=False  # Don't include conversation history
)

response = await agent.generate(
    "List files in /tmp",
    params=params
)

With MCP Metadata

params = RequestParams(
    mcp_metadata={
        "request_id": "req_abc123",
        "user_context": {"role": "admin", "permissions": ["read", "write"]}
    }
)

response = await agent.generate(
    "Update the configuration file",
    params=params
)

Parameter Validation

Fast Agent validates request parameters and provides helpful error messages:

# This will raise a validation error
params = RequestParams(
    maxTokens=-1  # Invalid: must be positive
)

# This will raise a validation error
params = RequestParams(
    temperature=2.5  # Invalid: typically 0.0-2.0
)

Combining with Agent Configuration

Request parameters override agent-level defaults:

agent = Agent(
    default_model="gpt-5-mini",
    temperature=0.7
)

# This request uses temperature=0.2 instead of agent's 0.7
response = await agent.generate(
    "Be very precise",
    params=RequestParams(temperature=0.2)
)

Core API

Types & Settings

MCP Integration

Utilities

Overview

Core Parameters

Conversation Control

Tool Execution

Structured Output

Advanced Parameters

Sampling Parameters

Penalty Parameters

Service Tier

Template Variables

Internal Parameters

Usage Examples

Basic Generation

Tool-Heavy Workload

Controlled Randomness

Debug Mode

With MCP Metadata

Parameter Validation

Combining with Agent Configuration

Build docs developers (and LLMs) love

Core API

Types & Settings

MCP Integration

Utilities

​Overview

​Core Parameters

​Conversation Control

​Tool Execution

​Structured Output

​Advanced Parameters

​Sampling Parameters

​Penalty Parameters

​Service Tier

​Template Variables

​Internal Parameters

​Usage Examples

​Basic Generation

​Tool-Heavy Workload

​Controlled Randomness

​Debug Mode

​With MCP Metadata

​Parameter Validation

​Combining with Agent Configuration

Build docs developers (and LLMs) love

Overview

Core Parameters

Conversation Control

Tool Execution

Structured Output

Advanced Parameters

Sampling Parameters

Penalty Parameters

Service Tier

Template Variables

Internal Parameters

Usage Examples

Basic Generation

Tool-Heavy Workload

Controlled Randomness

Debug Mode

With MCP Metadata

Parameter Validation

Combining with Agent Configuration