Skip to main content

Overview

The LlmAgent is the foundational agent class in Fast Agent that provides LLM interaction, conversation management, and rich console display capabilities. It serves as the base for all other agent types.
LlmAgent is typically used as a base class. For practical applications, consider using ToolAgent or McpAgent which extend LlmAgent with additional capabilities.

Key Capabilities

Message Management

Multi-turn conversations with history tracking

Streaming Display

Real-time response rendering with markdown support

Stop Handling

Graceful handling of completion reasons

Usage Tracking

Token counting and cost estimation

Class Hierarchy

LlmAgent
    ↓ extends
LlmDecorator
    ↓ uses  
FastAgentLLMProtocol
LlmAgent: UI and interaction layer LlmDecorator: Core LLM logic and operations FastAgentLLMProtocol: Provider interface (OpenAI, Anthropic, etc.)

Source Code Reference

The LlmAgent class is defined in: src/fast_agent/agents/llm_agent.py Key methods at:
  • generate_impl(): src/fast_agent/agents/llm_agent.py:543
  • show_assistant_message(): src/fast_agent/agents/llm_agent.py:109
  • structured_impl(): src/fast_agent/agents/llm_agent.py:699

Basic Usage

Creating an Agent

import asyncio
from fast_agent.agents.agent_types import AgentConfig
from fast_agent.agents.llm_agent import LlmAgent
from fast_agent.core import Core
from fast_agent.llm.model_factory import ModelFactory

async def main():
    # Initialize Fast Agent core
    core = Core()
    await core.initialize()
    
    # Configure the agent
    config = AgentConfig(
        name="assistant",
        instruction="You are a helpful AI assistant.",
        model="gpt-4o-mini"
    )
    
    # Create and initialize agent
    agent = LlmAgent(config, context=core.context)
    await agent.attach_llm(ModelFactory.create_factory("gpt-4o-mini"))
    
    # Send a message
    response = await agent.send("What is Fast Agent?")
    print(response)
    
    await core.cleanup()

asyncio.run(main())

Generating Responses

from fast_agent.core.prompt import Prompt
from fast_agent.types import PromptMessageExtended

# Method 1: Simple send
response = await agent.send("Hello!")

# Method 2: Generate with message list
messages = [
    Prompt.user("Explain Python decorators"),
]
response = await agent.generate(messages, None)
text = response.first_text()

# Method 3: Full message control
message = PromptMessageExtended(
    role="user",
    content="What's 2+2?"
)
response = await agent.generate([message], None)

Message Display

Display Properties

The agent uses ConsoleDisplay for rich terminal output:
# Access display component
display = agent.display

# Check if streaming is enabled
enabled, mode = display.resolve_streaming_preferences()
print(f"Streaming: {enabled}, Mode: {mode}")
# Output: Streaming: True, Mode: markdown

Custom Message Display

from rich.text import Text

# Display assistant message with custom formatting
response_message = PromptMessageExtended(
    role="assistant",
    content="Here's the answer..."
)

await agent.show_assistant_message(
    response_message,
    name="MyAgent",
    model="gpt-4o",
    additional_message=Text("\nProcessed in 1.2s", style="dim"),
    render_markdown=True
)

User Message Display

user_message = PromptMessageExtended(
    role="user",
    content="Analyze this code"
)

agent.show_user_message(user_message)

Conversation History

Accessing History

# Get full conversation history
history = agent.message_history

for msg in history:
    print(f"{msg.role}: {msg.content[:50]}...")
    if msg.tool_calls:
        print(f"  Tool calls: {len(msg.tool_calls)}")

Managing History

# Clear conversation history
agent.clear()

# Clear history and prompts
agent.clear(clear_prompts=True)

# Load custom history
from fast_agent.types import PromptMessageExtended

conversation = [
    PromptMessageExtended(role="user", content="Hello"),
    PromptMessageExtended(role="assistant", content="Hi! How can I help?"),
    PromptMessageExtended(role="user", content="Tell me about Fast Agent"),
]

agent.load_message_history(conversation)

History Configuration

config = AgentConfig(
    name="stateless_agent",
    instruction="Answer questions without context.",
    use_history=False  # Disable history tracking
)

Streaming Responses

Automatic Streaming

Streaming is enabled by default and handled automatically:
# This will stream the response token-by-token
response = await agent.send(
    "Write a long story about a space explorer"
)

Controlling Streaming

# Disable streaming for next turn only
agent.force_non_streaming_next_turn(reason="debugging output")

response = await agent.send("Generate code")
# This response won't stream

response = await agent.send("Another request")
# Streaming resumes

Streaming Internals

# Check if streaming is active
if agent._active_stream_handle:
    print("Currently streaming")

# Close active streaming display
agent.close_active_streaming_display(reason="starting parallel operation")

Stop Reason Handling

Understanding Stop Reasons

from fast_agent.types import LlmStopReason

response = await agent.generate([Prompt.user("Hi")], None)

match response.stop_reason:
    case LlmStopReason.END_TURN:
        print("Completed normally")
    case LlmStopReason.MAX_TOKENS:
        print("Hit token limit - consider increasing max_tokens")
    case LlmStopReason.TOOL_USE:
        print("Requested tool execution")
    case LlmStopReason.SAFETY:
        print("Safety filter triggered")
    case LlmStopReason.ERROR:
        print("Error occurred during generation")
    case LlmStopReason.CANCELLED:
        print("User cancelled generation")

Error Channel Details

from fast_agent.constants import FAST_AGENT_ERROR_CHANNEL
from fast_agent.mcp.helpers.content_helpers import get_text

response = await agent.generate([Prompt.user("Test")], None)

if response.stop_reason == LlmStopReason.ERROR:
    if response.channels and FAST_AGENT_ERROR_CHANNEL in response.channels:
        error_blocks = response.channels[FAST_AGENT_ERROR_CHANNEL]
        error_text = get_text(error_blocks[0])
        print(f"Error details: {error_text}")

Structured Output

Basic Structured Generation

from pydantic import BaseModel

class Recipe(BaseModel):
    name: str
    ingredients: list[str]
    steps: list[str]
    cook_time_minutes: int

messages = [Prompt.user("Give me a recipe for chocolate chip cookies")]

recipe, message = await agent.structured(
    messages,
    Recipe,
    None
)

if recipe:
    print(f"Recipe: {recipe.name}")
    print(f"Ingredients: {len(recipe.ingredients)}")
    print(f"Cook time: {recipe.cook_time_minutes} minutes")

Complex Models

from typing import Literal
from pydantic import BaseModel, Field

class Sentiment(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0, le=1)
    key_phrases: list[str]
    explanation: str

text = """I love using Fast Agent! The documentation is clear 
          and the API is intuitive."""

messages = [Prompt.user(f"Analyze sentiment: {text}")]
result, _ = await agent.structured(messages, Sentiment, None)

if result:
    print(f"Sentiment: {result.sentiment} ({result.confidence:.0%})")
    print(f"Key phrases: {', '.join(result.key_phrases)}")

Usage Tracking

Accessing Usage Data

# Get usage accumulator
usage = agent.usage_accumulator

if usage:
    print(f"Input tokens: {usage.input_tokens}")
    print(f"Output tokens: {usage.output_tokens}")
    print(f"Total tokens: {usage.total_tokens}")
    print(f"Total cost: ${usage.total_cost:.4f}")
    print(f"Context window: {usage.context_usage_percentage:.1f}%")

Model Information

# Get current model name
model_name = agent.llm.model_name if agent.llm else None
print(f"Using model: {model_name}")

# Check context percentage during tool calls
if agent.usage_accumulator:
    ctx_pct = agent.usage_accumulator.context_usage_percentage
    if ctx_pct and ctx_pct > 80:
        print("Warning: Approaching context window limit")

Advanced Features

Workflow Telemetry

from fast_agent.workflow_telemetry import WorkflowTelemetryProvider

# Create custom telemetry provider
class MyTelemetry(WorkflowTelemetryProvider):
    async def emit_delegation_step(self, step_data):
        print(f"Delegation: {step_data}")

# Attach to agent
agent.workflow_telemetry = MyTelemetry()

Message Hooks

Extend LlmAgent to add custom hooks:
class CustomAgent(LlmAgent):
    async def generate_impl(
        self,
        messages,
        request_params,
        tools
    ):
        # Pre-processing hook
        print(f"Generating response for {len(messages)} messages")
        
        # Call parent implementation
        response = await super().generate_impl(
            messages,
            request_params,
            tools
        )
        
        # Post-processing hook
        print(f"Response generated: {response.stop_reason}")
        
        return response

Custom Display

from fast_agent.ui.console_display import ConsoleDisplay

# Create custom display
class CustomDisplay(ConsoleDisplay):
    async def show_assistant_message(self, message, **kwargs):
        # Custom rendering logic
        print(f"[CUSTOM] {message.content}")
        await super().show_assistant_message(message, **kwargs)

# Attach to agent
agent.display = CustomDisplay(config=core.context.config)

Configuration Reference

AgentConfig Parameters

config = AgentConfig(
    # Required
    name="my_agent",
    
    # System prompt
    instruction="You are a helpful assistant.",
    
    # Model selection
    model="gpt-4o-mini",
    
    # History management
    use_history=True,
    
    # Description
    description="A general-purpose AI assistant",
    
    # Request parameters
    default_request_params=RequestParams(
        max_tokens=4096,
        temperature=0.7,
        use_history=True
    ),
    
    # Agent type
    agent_type=AgentType.BASIC,
    
    # API configuration
    api_key="sk-...",  # Optional override
)

RequestParams

from fast_agent.types import RequestParams

params = RequestParams(
    max_tokens=2048,
    temperature=0.5,
    top_p=0.9,
    use_history=True,
    systemPrompt="Custom system message"
)

response = await agent.generate(
    [Prompt.user("Hello")],
    params
)

Best Practices

  • Clear history between unrelated conversations
  • Monitor context window usage with usage_accumulator
  • Use use_history=False for stateless queries
  • Consider history size impact on latency and cost
  • Leave streaming enabled for better UX
  • Disable streaming for debugging or testing
  • Close streams before starting parallel operations
  • Handle streaming errors gracefully
  • Always check stop_reason after generation
  • Handle MAX_TOKENS by increasing limits or summarizing
  • Implement retry logic for ERROR stop reasons
  • Log errors with context for debugging
  • Use structured output for reliable parsing
  • Batch related requests when possible
  • Monitor token usage to optimize costs
  • Cache responses for identical queries

Next Steps

Tool Agent

Add function calling capabilities

MCP Agent

Connect to MCP servers for tools and resources

LLM Providers

Configure different LLM providers

Message Types

Learn about message structures

Build docs developers (and LLMs) love