Agent State Machine

The Loom agent uses an explicit, event-driven state machine to manage conversation flow and tool execution. This design provides predictable behavior, clear ownership of context, graceful error recovery, and clean separation between state logic and I/O operations.

Design Principles

Predictable Behavior

All state transitions are explicit and testable with exhaustive pattern matching.

Clear Ownership

Each state carries its required context (conversation, retries, etc.).

Graceful Recovery

Built-in retry mechanisms with bounded attempts and backoff.

Clean Separation

State machine logic is synchronous and pure; caller manages async I/O.

State Machine Overview

The state machine receives AgentEvents and returns AgentActions that the caller must execute. This inversion of control allows the caller to manage async operations (LLM calls, tool execution) while the state machine remains synchronous and pure.

States

AgentState Enum

pub enum AgentState {
    WaitingForUserInput {
        conversation: ConversationContext,
    },
    CallingLlm {
        conversation: ConversationContext,
        retries: u32,
    },
    ProcessingLlmResponse {
        conversation: ConversationContext,
        response: LlmResponse,
    },
    ExecutingTools {
        conversation: ConversationContext,
        executions: Vec<ToolExecutionStatus>,
    },
    PostToolsHook {
        conversation: ConversationContext,
        pending_llm_request: LlmRequest,
        completed_tools: Vec<CompletedToolInfo>,
    },
    Error {
        conversation: ConversationContext,
        error: AgentError,
        retries: u32,
        origin: ErrorOrigin,
    },
    ShuttingDown,
}

State Details

WaitingForUserInput

Initial and terminal state for user turnsThe agent is idle and awaits user input. The conversation context preserves all prior messages.Transitions:

UserInput → CallingLlm
ShutdownRequested → ShuttingDown

CallingLlm

Active LLM request in flightThe retries counter tracks how many retry attempts have been made for the current request.Transitions:

TextDelta → CallingLlm (streaming text)
ToolCallDelta → CallingLlm (streaming tool call)
Completed → ProcessingLlmResponse
Error (retries < max) → Error
Error (retries >= max) → WaitingForUserInput
ShutdownRequested → ShuttingDown

ProcessingLlmResponse

Transient state for examining an LLM responseImmediately transitions to either ExecutingTools (if tool calls present) or WaitingForUserInput (if text-only response).Transitions:

Has tool calls → ExecutingTools
No tool calls → WaitingForUserInput
ShutdownRequested → ShuttingDown

ExecutingTools

Tracks multiple concurrent tool executionsEach execution progresses through Pending → Running → Completed. The state tracks all executions via Vec<ToolExecutionStatus>.Transitions:

ToolCompleted (some pending) → ExecutingTools
ToolCompleted (all done, mutating tools) → PostToolsHook
ToolCompleted (all done, no mutation) → CallingLlm
ShutdownRequested → ShuttingDown

PostToolsHook

Runs post-tool hooks after tool executionThis state enables features like auto-commit that need to run after file-modifying tools (e.g., edit_file, bash).Fields:

pending_llm_request - The next LLM request to send after hooks complete
completed_tools - Information about which tools completed (for hook decision-making)

Transitions:

PostToolsHookCompleted → CallingLlm
ShutdownRequested → ShuttingDown

Error

Holds failed state with retry informationThe origin field (Llm, Tool, or Io) determines retry strategy.Transitions:

RetryTimeoutFired → CallingLlm
ShutdownRequested → ShuttingDown

ShuttingDown

Terminal stateNo transitions out. The agent should be dropped after reaching this state.

Events

AgentEvent Enum

pub enum AgentEvent {
    UserInput(Message),
    LlmEvent(LlmEvent),
    ToolProgress(ToolProgressEvent),
    ToolCompleted {
        call_id: String,
        outcome: ToolExecutionOutcome,
    },
    PostToolsHookCompleted {
        action_taken: bool,
    },
    RetryTimeoutFired,
    ShutdownRequested,
}

LlmEvent Sub-variants

pub enum LlmEvent {
    TextDelta {
        content: String,
    },
    ToolCallDelta {
        call_id: String,
        tool_name: String,
        arguments_fragment: String,
    },
    Completed(LlmResponse),
    Error(LlmError),
}

ToolExecutionOutcome

pub enum ToolExecutionOutcome {
    Success {
        call_id: String,
        output: serde_json::Value,
    },
    Error {
        call_id: String,
        error: ToolError,
    },
}

Actions

AgentAction Enum

Actions are returned to the caller indicating what I/O operation to perform:

pub enum AgentAction {
    SendLlmRequest(LlmRequest),
    ExecuteTools(Vec<ToolCall>),
    RunPostToolsHook {
        completed_tools: Vec<CompletedToolInfo>,
    },
    WaitForInput,
    DisplayMessage(String),
    DisplayError(String),
    Shutdown,
}

The caller is responsible for executing actions and feeding events back into the state machine via agent.handle_event(event).

State Transitions

Transition Table

Current State	Event	New State	Action
`WaitingForUserInput`	`UserInput(msg)`	`CallingLlm`	`SendLlmRequest`
`CallingLlm`	`LlmEvent::TextDelta`	`CallingLlm`	`DisplayMessage`
`CallingLlm`	`LlmEvent::ToolCallDelta`	`CallingLlm`	`WaitForInput`
`CallingLlm`	`LlmEvent::Completed`	`ProcessingLlmResponse`	(internal)
`CallingLlm`	`LlmEvent::Error` (retries < max)	`Error`	`WaitForInput`
`CallingLlm`	`LlmEvent::Error` (retries >= max)	`WaitingForUserInput`	`DisplayError`
`ProcessingLlmResponse`	(has tool calls)	`ExecutingTools`	`ExecuteTools`
`ProcessingLlmResponse`	(no tool calls)	`WaitingForUserInput`	`WaitForInput`
`ExecutingTools`	`ToolCompleted` (some pending)	`ExecutingTools`	`WaitForInput`
`ExecutingTools`	`ToolCompleted` (all done, mutating)	`PostToolsHook`	`RunPostToolsHook`
`ExecutingTools`	`ToolCompleted` (all done, no mutation)	`CallingLlm`	`SendLlmRequest`
`PostToolsHook`	`PostToolsHookCompleted`	`CallingLlm`	`SendLlmRequest`
`Error` (origin=Llm)	`RetryTimeoutFired`	`CallingLlm`	`SendLlmRequest`
any state	`ShutdownRequested`	`ShuttingDown`	`Shutdown`

Implementation

Core Method

impl Agent {
    pub fn handle_event(
        &mut self,
        event: AgentEvent
    ) -> AgentResult<AgentAction> {
        // Pattern match on (current_state, event)
        // Update self.state
        // Return action for caller to execute
    }
}

The handle_event method is synchronous and returns immediately. No async operations are performed inside the state machine.

Example Usage

let mut agent = Agent::new(config);

// User sends a message
let action = agent.handle_event(AgentEvent::UserInput(message))?;
match action {
    AgentAction::SendLlmRequest(request) => {
        // Execute async LLM call
        let mut stream = llm_client.complete_streaming(request).await?;
        
        // Feed stream events back to state machine
        while let Some(event) = stream.next().await {
            let action = agent.handle_event(AgentEvent::LlmEvent(event))?;
            // Handle action...
        }
    }
    AgentAction::ExecuteTools(tool_calls) => {
        // Execute tools in parallel
        for tool_call in tool_calls {
            let outcome = execute_tool(tool_call).await?;
            let action = agent.handle_event(AgentEvent::ToolCompleted {
                call_id: tool_call.id,
                outcome,
            })?;
            // Handle action...
        }
    }
    _ => {}
}

Design Decisions

Why Explicit State Machine vs Implicit

Testability

Every state and transition can be unit tested in isolation. Property-based tests verify invariants like “shutdown always succeeds from any state”.

Debuggability

State transitions are logged with tracing::info!, making it easy to trace agent behavior in production.

No Hidden State

All context is carried explicitly in state variants. There are no ambient flags or mutable fields that could get out of sync.

Exhaustive Matching

Rust’s match ensures all state/event combinations are handled. New events or states trigger compiler errors until addressed.

Why Events Are Processed Synchronously

The handle_event method is synchronous and returns immediately:

pub fn handle_event(&mut self, event: AgentEvent) -> AgentResult<AgentAction>

Rationale:

Separation of Concerns - The state machine decides what to do; the caller decides how to do it (async, parallel, etc.)
Backpressure - The caller controls the pace of event delivery. No internal queues or background tasks
Determinism - Given the same sequence of events, the state machine produces the same sequence of actions (essential for testing and replay)
Flexibility - The caller can implement different execution strategies (single-threaded, tokio, async-std) without changing the state machine

How Conversation Context Is Threaded

Each state variant carries its own ConversationContext:

pub enum AgentState {
    WaitingForUserInput {
        conversation: ConversationContext,
    },
    CallingLlm {
        conversation: ConversationContext,
        retries: u32,
    },
    // ...
}

During transitions, the context is cloned and updated:

UserInput → message appended to conversation
LlmEvent::Completed → assistant message appended
ToolCompleted (all done) → tool result messages appended

This ensures the conversation history is always consistent with the current state.

Testing Guidelines

Unit Tests

Verify specific state transitions in isolation:

#[test]
fn transitions_to_calling_llm_on_user_input() {
    let mut agent = Agent::new(config);
    let action = agent.handle_event(
        AgentEvent::UserInput(Message::user("hello"))
    ).unwrap();
    
    assert!(matches!(action, AgentAction::SendLlmRequest(_)));
    assert!(matches!(agent.state(), AgentState::CallingLlm { .. }));
}

Property Tests

Verify invariants hold across all configurations:

proptest! {
    #[test]
    fn agent_always_starts_in_waiting(config in any::<AgentConfig>()) {
        let agent = Agent::new(config);
        assert!(matches!(agent.state(), AgentState::WaitingForUserInput { .. }));
    }
    
    #[test]
    fn shutdown_succeeds_from_any_state(state in any::<AgentState>()) {
        let mut agent = Agent::with_state(state);
        let action = agent.handle_event(AgentEvent::ShutdownRequested).unwrap();
        assert!(matches!(action, AgentAction::Shutdown));
        assert!(matches!(agent.state(), AgentState::ShuttingDown));
    }
}

Integration Tests

Verify end-to-end flows through multiple transitions:

#[tokio::test]
async fn completes_full_conversation_cycle() {
    let mut agent = Agent::new(config);
    let llm_client = MockLlmClient::new();
    
    // User input
    let action = agent.handle_event(
        AgentEvent::UserInput(Message::user("hello"))
    ).unwrap();
    
    // LLM completion
    let action = agent.handle_event(
        AgentEvent::LlmEvent(LlmEvent::Completed(response))
    ).unwrap();
    
    // Back to waiting
    assert!(matches!(agent.state(), AgentState::WaitingForUserInput { .. }));
}

Extension Guide

Adding a New State

Add variant to AgentState

pub enum AgentState {
    // existing variants...
    NewState {
        conversation: ConversationContext,
        custom_field: CustomType,
    },
}

Update name() method

Self::NewState { .. } => "NewState",

Update conversation() accessors

Handle the new variant in agent.rs.

Add transition handlers

(AgentState::NewState { conversation, .. }, AgentEvent::SomeEvent) => {
    // transition logic
}

Add tests

Verify transitions to/from the new state.

Adding a New Event

Add variant to AgentEvent

pub enum AgentEvent {
    // existing variants...
    NewEvent { payload: PayloadType },
}

Handle the event

In each relevant state within handle_event().

Update catch-all pattern

Invalid transitions log a warning and return WaitForInput.

Add property tests

Verify the event is handled correctly from all reachable states.

Source Files

state.rs

State and event type definitions

agent.rs

State machine implementation

State Machine Spec

Detailed state machine specification

Architecture Overview

High-level system architecture

Get Started

Core Features

Architecture

Integrations

Observability

Deployment

​Design Principles

Predictable Behavior

Clear Ownership

Graceful Recovery

Clean Separation

​State Machine Overview

​States

​AgentState Enum

​State Details

​Events

​AgentEvent Enum

​LlmEvent Sub-variants

​ToolExecutionOutcome

​Actions

​AgentAction Enum

​State Transitions

​Transition Table

​Implementation

​Core Method

​Example Usage

​Design Decisions

​Why Explicit State Machine vs Implicit

​Why Events Are Processed Synchronously

​How Conversation Context Is Threaded

​Testing Guidelines

​Unit Tests

​Property Tests

​Integration Tests

​Extension Guide

​Adding a New State

​Adding a New Event

​Source Files

state.rs

agent.rs

State Machine Spec

Architecture Overview

Build docs developers (and LLMs) love

Design Principles

State Machine Overview

States

AgentState Enum

State Details

Events

AgentEvent Enum

LlmEvent Sub-variants

ToolExecutionOutcome

Actions

AgentAction Enum

State Transitions

Transition Table

Implementation

Core Method

Example Usage

Design Decisions

Why Explicit State Machine vs Implicit

Why Events Are Processed Synchronously

How Conversation Context Is Threaded

Testing Guidelines

Unit Tests

Property Tests

Integration Tests

Extension Guide

Adding a New State

Adding a New Event

Source Files