Overview
The LlmAgent is the foundational agent type in Fast Agent that provides core LLM interaction capabilities. It handles conversation management, message display, streaming responses, and stop reason handling.
LlmAgent extends LlmDecorator with UI display methods, tool call tracking, and chat interaction patterns while delegating core LLM operations to the attached FastAgentLLMProtocol.
Key Features
Conversation Management : Maintains message history and handles multi-turn conversations
Streaming Support : Displays responses as they’re generated with configurable streaming modes
Stop Reason Handling : Gracefully handles different completion reasons (END_TURN, MAX_TOKENS, SAFETY, etc.)
Display Integration : Rich console display with syntax highlighting and formatting
Usage Tracking : Tracks token usage and context percentage
Message Rendering : Supports markdown rendering and custom message display
Architecture
The LlmAgent is part of a three-layer architecture:
LlmAgent (interaction & display)
↓ extends
LlmDecorator (core LLM logic)
↓ uses
FastAgentLLMProtocol (LLM provider interface)
Creating a Basic Agent
Simple Configuration
import asyncio
from fast_agent.agents.agent_types import AgentConfig
from fast_agent.agents.llm_agent import LlmAgent
from fast_agent.core import Core
from fast_agent.llm.model_factory import ModelFactory
async def main ():
core = Core()
await core.initialize()
# Create agent configuration
config = AgentConfig(
name = "assistant" ,
instruction = "You are a helpful assistant." ,
model = "gpt-4o-mini"
)
# Create the agent
agent = LlmAgent(config, context = core.context)
# Attach the LLM
await agent.attach_llm(ModelFactory.create_factory( "gpt-4o-mini" ))
# Send a message
response = await agent.send( "Hello, how are you?" )
print (response)
await core.cleanup()
asyncio.run(main())
With Custom Instructions
config = AgentConfig(
name = "writer" ,
instruction = """You are a professional technical writer.
Guidelines:
- Write clear, concise documentation
- Use examples to illustrate concepts
- Structure content with headings
- Include code snippets when helpful
""" ,
model = "claude-3-5-sonnet-20241022" ,
use_history = True
)
agent = LlmAgent(config, context = core.context)
await agent.attach_llm(ModelFactory.create_factory( "claude-3-5-sonnet-20241022" ))
Configuration Options
AgentConfig Parameters
instruction
str
default: "DEFAULT_AGENT_INSTRUCTION"
System prompt/instruction for the agent
Model identifier (e.g., “gpt-4o-mini”, “claude-3-5-sonnet”)
Whether to maintain conversation history
Human-readable description of the agent’s purpose
Default parameters for LLM requests
Working with Messages
Sending Messages
# Simple text message
response = await agent.send( "What is Fast Agent?" )
# Generate with full control
from fast_agent.core.prompt import Prompt
messages = [
Prompt.user( "Explain quantum computing" ),
]
response = await agent.generate(messages, None )
print (response.first_text())
Message History
# Access conversation history
history = agent.message_history
for msg in history:
print ( f " { msg.role } : { msg.content } " )
# Clear history
agent.clear()
# Load custom history
from fast_agent.types import PromptMessageExtended
custom_history = [
PromptMessageExtended( role = "user" , content = "Hello" ),
PromptMessageExtended( role = "assistant" , content = "Hi there!" ),
]
agent.load_message_history(custom_history)
Display and Streaming
Display Configuration
The agent uses ConsoleDisplay for rich terminal output:
# Access display settings
display = agent.display
# Check streaming preferences
enabled, mode = display.resolve_streaming_preferences()
print ( f "Streaming: { enabled } , Mode: { mode } " )
Controlling Streaming
# Disable streaming for next turn
agent.force_non_streaming_next_turn( reason = "debugging" )
# Close active streaming display
agent.close_active_streaming_display( reason = "parallel operations" )
Stop Reasons
The agent handles various completion reasons:
Stop Reason Description Agent Behavior END_TURNNormal completion Display response MAX_TOKENSToken limit reached Show warning TOOL_USETool call requested Execute tools (if ToolAgent) SAFETYSafety filter triggered Show error PAUSELLM requested pause Show notification ERRORError occurred Display error details CANCELLEDUser cancelled Show cancellation
Advanced Usage
Structured Output
from pydantic import BaseModel
class WeatherReport ( BaseModel ):
city: str
temperature: int
conditions: str
messages = [Prompt.user( "What's the weather in Paris?" )]
result, message = await agent.structured(
messages,
WeatherReport,
None
)
if result:
print ( f "Temperature in { result.city } : { result.temperature } °C" )
Custom Message Display
from fast_agent.types import PromptMessageExtended
from rich.text import Text
# Create a custom message
message = PromptMessageExtended(
role = "assistant" ,
content = "Custom response"
)
# Display with custom formatting
await agent.show_assistant_message(
message,
name = "CustomAgent" ,
model = "gpt-4" ,
additional_message = Text( "Extra info" , style = "dim" ),
render_markdown = True
)
Usage Tracking
# Get usage accumulator
usage = agent.usage_accumulator
if usage:
print ( f "Input tokens: { usage.input_tokens } " )
print ( f "Output tokens: { usage.output_tokens } " )
print ( f "Total cost: $ { usage.total_cost :.4f} " )
print ( f "Context usage: { usage.context_usage_percentage :.1f} %" )
Best Practices
Keep instructions clear and specific
Include examples for complex tasks
Use structured formatting for guidelines
Test with various inputs to validate behavior
Clear history when starting new topics
Monitor context window usage
Use use_history=False for stateless interactions
Load custom history for specific workflows
Always check stop_reason for errors
Handle safety filters gracefully
Monitor token limits
Implement retry logic for transient failures
Next Steps
Tool Agent Add function calling capabilities to your agent
MCP Agent Connect to MCP servers for extended functionality
LLM Agent Learn about the full LlmAgent API
Configuration Explore all configuration options