Skip to main content

Threads, Runs, and Messages

Foundry Agent Service uses persistent threads, runs, and messages to manage conversation states and agent execution. Understanding these components is essential for building effective agents.

Core Components

Agent

A configurable orchestration component that:
  • Uses AI models with instructions and tools
  • Processes messages in threads
  • Maintains conversation context
  • Enforces safety and governance controls

Thread

A conversation session between an agent and a user:
  • Stores messages (up to 100,000 per thread)
  • Automatically handles context truncation
  • Persists until explicitly deleted
  • Maintains conversation history

Message

Individual communication within a thread:
  • Created by agents or users
  • Can include text, images, and files
  • Stored in ordered list format
  • Supports attachments

Run

An invocation of an agent on a thread:
  • Processes all messages in the thread
  • May append new messages (agent responses)
  • Calls models and tools as needed
  • Tracks execution status

Agent Workflow

1

Create Agent

Define agent with model, instructions, and tools
2

Create Thread

Create conversation session (reuse for ongoing conversations)
3

Send Messages

Add user messages to the thread
4

Run Agent

Execute agent to process messages
5

Monitor Status

Poll run status until completion
6

Get Response

Retrieve agent’s messages from thread

Run Status Values

StatusDescription
queuedRun is waiting to be processed
in_progressAgent is actively processing
requires_actionAgent needs function call results
completedRun finished successfully
failedRun encountered an error
cancelledRun was cancelled by user
expiredRun exceeded time limits (10 min)

Code Examples

Basic Agent Execution

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
import time

project = AIProjectClient(
    endpoint="https://<resource>.services.ai.azure.com/api/projects/<project>",
    credential=DefaultAzureCredential()
)

# Create agent
agent = project.agents.create_agent(
    model="gpt-4o",
    name="my-agent",
    instructions="You are a helpful assistant"
)

# Create thread
thread = project.agents.threads.create()

# Add message
message = project.agents.messages.create(
    thread_id=thread.id,
    role="user",
    content="Hello! Can you help me?"
)

# Create and monitor run
run = project.agents.runs.create(thread_id=thread.id, agent_id=agent.id)

while run.status in ["queued", "in_progress"]:
    time.sleep(1)
    run = project.agents.runs.get(thread_id=thread.id, run_id=run.id)

print(f"Run status: {run.status}")

# Get messages
if run.status == "completed":
    messages = project.agents.messages.list(thread_id=thread.id)
    for msg in messages:
        print(f"{msg['role']}: {msg['content']}")

# Cleanup
project.agents.delete_agent(agent.id)
project.agents.threads.delete(thread.id)

Using create_and_poll

# Simpler alternative that handles polling
run = project.agents.runs.create_and_poll(
    thread_id=thread.id,
    agent_id=agent.id
)

if run.status == "completed":
    messages = project.agents.messages.list(thread_id=thread.id)
    print(messages)

Thread Management

When to Create New Threads

Create a new thread when:
  • Starting a fresh topic or conversation
  • User explicitly wants to “start over”
  • Different users (each user should have their own thread)
  • Thread becomes too large (impacts performance)
Reuse existing thread when:
  • Continuing an ongoing conversation
  • Maintaining conversation context
  • Building on previous interactions

Thread Lifecycle

Threads persist until explicitly deleted:
# Delete thread when no longer needed
project.agents.threads.delete(thread_id=thread.id)
Storage considerations:
  • Threads with many messages consume storage
  • Plan retention strategy based on:
    • Storage costs
    • Compliance requirements
    • Business needs

Thread Limits

  • Maximum 100,000 messages per thread
  • Automatic context truncation when needed
  • Performance may degrade with thousands of messages
  • Consider creating new threads for long conversations

Best Practices

Delete threads and agents when no longer needed:
# Delete agent
project.agents.delete_agent(agent.id)

# Delete thread
project.agents.threads.delete(thread.id)
Always check run status and implement retry logic:
import time

max_retries = 3
for attempt in range(max_retries):
    run = project.agents.runs.create(thread_id=thread.id, agent_id=agent.id)
    
    while run.status in ["queued", "in_progress"]:
        time.sleep(1)
        run = project.agents.runs.get(thread_id=thread.id, run_id=run.id)
    
    if run.status == "completed":
        break
    elif run.status == "failed":
        if attempt < max_retries - 1:
            print(f"Run failed, retrying... (attempt {attempt + 1})")
            time.sleep(2 ** attempt)  # Exponential backoff
        else:
            print(f"Run failed after {max_retries} attempts")
            # Handle failure
Start with short intervals, increase for longer operations:
import time

delay = 0.5  # Start with 500ms
max_delay = 5  # Cap at 5 seconds

while run.status in ["queued", "in_progress"]:
    time.sleep(delay)
    run = project.agents.runs.get(thread_id=thread.id, run_id=run.id)
    
    # Increase delay for next poll
    delay = min(delay * 1.5, max_delay)
Keep conversations concise for optimal performance:
  • Avoid extremely long messages
  • Summarize when threads get large
  • Create new threads for new topics
  • Monitor thread message count

Next Steps

Agent Overview

Learn about Foundry Agent Service

Environment Setup

Deploy agent infrastructure

Agent Tools

Extend agent capabilities

Quickstart

Create your first agent

Build docs developers (and LLMs) love