Core Concepts
This guide explains the fundamental concepts behind AutoGen: what agents are, how they work together in teams, how they use tools and models, and how the layered architecture provides flexibility for different use cases.What is an Agent?
An agent is a software entity that:- Communicates via messages
- Maintains its own state
- Performs actions in response to messages
- Can modify its state and produce external effects
- Processes messages independently
- Maintains isolated state
- Communicates only through message passing
- Can create new agents or send messages to others
Think of agents as autonomous workers with specialized roles. They don’t share memory directly but collaborate by exchanging messages.
Types of Agents
AutoGen provides several preset agent types in the AgentChat API:AssistantAgent
LLM-powered agentUses a language model to process messages and can call tools. Supports reflection on tool use and streaming.
CodeExecutorAgent
Safe code executionExecutes Python code in isolated environments (Docker or local). Returns results or errors.
UserProxyAgent
Human-in-the-loopRepresents human users in multi-agent workflows. Can request input or operate autonomously.
Custom Agents
Build your ownImplement custom behavior by extending base classes or using the Core API.
Agent Characteristics
All agents share these properties:- Name: Unique identifier within a team
- State: Internal data maintained across messages
- Message Handling: Logic for processing incoming messages
- Actions: Operations performed in response to messages
- Send messages to other agents
- Call tools to interact with external systems
- Generate responses using LLMs
- Execute code or make API calls
- Maintain conversation history
Multi-Agent Teams
A team is a group of agents working together toward a common goal. Teams implement multi-agent design patterns through coordinated message passing.Why Teams?
Separation of Concerns
Separation of Concerns
Each agent handles a specific responsibility:
- One agent writes code
- Another reviews it
- A third executes it
- A fourth summarizes results
Diverse Expertise
Diverse Expertise
Different agents can use:
- Different models (GPT-4 for reasoning, GPT-4o-mini for simple tasks)
- Different tools (one has web access, another has database access)
- Different instructions specialized for their role
Reflection and Critique
Reflection and Critique
Agents can review each other’s work:
- A writer agent creates content
- A critic agent provides feedback
- They iterate until quality is acceptable
Parallel Processing
Parallel Processing
Multiple agents can work simultaneously:
- Research agent gathers information
- Analysis agent processes data
- Visualization agent creates charts
Team Types
AutoGen provides several preset team patterns:- RoundRobinGroupChat
- SelectorGroupChat
- Swarm
- GraphFlow
Agents take turns in a fixed order. Simple and predictable.Use when: You want deterministic turn-taking, like writer → reviewer → editor cycles.
Team Patterns
Common multi-agent design patterns:- Reflection: Primary agent generates, critic reviews, iterate until approved
- Hierarchical: Orchestrator delegates to specialist agents
- Sequential: Agents form a pipeline (research → write → edit → publish)
- Parallel: Multiple agents work independently, results aggregated
- Debate: Agents with different perspectives discuss to reach consensus
Start with a single agent and only move to teams when the task genuinely requires collaboration. Teams need more careful prompting and debugging.
Tools
Tools allow agents to interact with the external world beyond text generation. An agent with tools can:- Call APIs
- Query databases
- Execute code
- Search the web
- Read/write files
- Control browsers
Function Tools
The simplest way to add tools is by passing Python functions:AutoGen automatically generates JSON schemas from function signatures and docstrings. Use type hints and descriptive docstrings for best results.
How Tools Work
- Agent receives a task: “What’s the weather in Seattle?”
- LLM decides to use a tool: Returns a function call request
- Framework executes the tool: Calls
get_weather("Seattle") - Result returned to LLM: “Weather in Seattle: 72°F, sunny”
- LLM generates response: “The weather in Seattle is currently 72°F and sunny.”
Advanced Tool Types
- MCP Servers
- Code Execution
- Agent Tools
- Custom Tools
Model Context Protocol servers provide collections of tools:Popular MCP servers:
- @playwright/mcp: Browser automation
- @modelcontextprotocol/server-filesystem: File operations
- @modelcontextprotocol/server-postgres: Database queries
Models
Models are the LLMs that power agent reasoning and text generation. AutoGen uses a model client abstraction to support multiple providers.Model Clients
All model clients implement theChatCompletionClient interface:
- OpenAI
- Azure OpenAI
- Anthropic Claude
- Google Gemini
- Local Models
gpt-4o- Latest multimodal modelgpt-4o-mini- Faster, cheaper variantgpt-4-turbo- Previous generationo1-preview- Advanced reasoning
Model Capabilities
Different models have different capabilities:| Feature | OpenAI GPT-4o | Claude 3.5 Sonnet | Gemini 2.0 Flash | Local (Llama 3.2) |
|---|---|---|---|---|
| Function Calling | ✅ | ✅ | ✅ | ⚠️ Limited |
| Streaming | ✅ | ✅ | ✅ | ✅ |
| Vision (Images) | ✅ | ✅ | ✅ | ❌ |
| JSON Mode | ✅ | ❌ | ✅ | ⚠️ Varies |
| Max Context | 128K | 200K | 1M | Varies |
Choose models based on your needs:
- Complex reasoning: GPT-4o, Claude 3.5 Sonnet, o1
- Speed/cost: GPT-4o-mini, Claude Haiku, Gemini Flash
- Privacy: Local models (Ollama)
- Long context: Claude (200K), Gemini (1M)
Layered Architecture
AutoGen’s three-layer design gives you flexibility to choose the right abstraction level:Layer 1: Core API (Foundation)
Event-driven agent runtimeThe foundation layer provides:Use when:
- Message-passing infrastructure
- Agent lifecycle management
- Topic-based pub/sub
- Standalone and distributed runtimes
- Cross-language support (Python ↔ .NET)
- You need event-driven architecture
- You want distributed execution across machines
- You need cross-language agents (Python + .NET)
- You want full control over message routing
Layer 2: AgentChat API (High-Level)
Intuitive defaults for rapid developmentBuilt on Core API with:Use when:
- Preset agent types (AssistantAgent, CodeExecutorAgent)
- Team patterns (RoundRobin, Selector, Swarm)
- Termination conditions
- Streaming helpers
- Human-in-the-loop support
- You’re prototyping quickly
- You want sensible defaults
- You’re new to AutoGen
- Common patterns fit your use case
Layer 3: Extensions API (Ecosystem)
Pluggable componentsExtensions provide:Use when:
- Model clients (OpenAI, Anthropic, Azure, etc.)
- Code executors (Docker, Jupyter)
- Memory systems (ChromaDB, Redis)
- Tool integrations (MCP, web browsing)
- Custom implementations welcome
- You need specific model providers
- You want tool integrations
- You’re building custom components
- You need production features (memory, caching)
Choosing the Right Layer
Start with AgentChat
Recommended for beginners
- Quickest path to working agents
- Best documentation and examples
- Covers 80% of use cases
- Easy to drop to Core API later
Drop to Core When Needed
For advanced scenarios
- Custom message protocols
- Event-driven patterns
- Distributed execution
- Cross-language requirements
Layer Philosophy: Start high (AgentChat), go deep (Core) when needed, extend freely (Extensions). You’re never locked in.
Message Flow
Understanding how messages flow is key to understanding AutoGen:Message Types
- TextMessage: Plain text from an agent
- FunctionCall: Request to execute a tool
- FunctionExecutionResult: Tool execution result
- ToolCallSummaryMessage: Summary of tool calls
- HandoffMessage: Transfer control between agents
- TaskMessage: Initial task from user
- StopMessage: Signal termination
Runtime Environments
AutoGen supports two runtime modes:- Standalone Runtime
- Distributed Runtime
Single-process executionAll agents run in the same process. Simple and fast.Use for:
- Development and testing
- Single-machine deployments
- Simpler debugging
- Lower latency
Agents work the same way in both runtimes. You can develop with standalone and deploy with distributed without changing agent code.
Next Steps
Now that you understand the concepts:Build Multi-Agent Teams
Learn team patterns and collaboration
Work with Tools
Add function tools, MCP servers, and code execution
Configure Models
Set up different LLM providers
Explore Core API
Event-driven patterns and distributed runtime
Key Takeaways
- Agents are autonomous entities that communicate via messages
- Teams coordinate multiple agents using patterns like RoundRobin or Swarm
- Tools let agents interact with external systems via function calling
- Models provide LLM capabilities through a unified client interface
- Layered architecture gives you flexibility: start simple, go deep when needed
- Message passing is the foundation of all agent communication
- Runtimes support both single-process and distributed execution
The best way to learn is by building. Try the Examples to see these concepts in action.