Magentic-One - AutoGen

Magentic-One is a generalist multi-agent system for solving open-ended web and file-based tasks across a variety of domains. It represents a significant step forward for multi-agent systems, achieving competitive performance on numerous agentic benchmarks.

Magentic-One is now fully integrated into autogen-agentchat, providing a modular and easy-to-use interface. The original implementation based on autogen-core is deprecated but available here.

Using Magentic-One involves interacting with a digital world designed for humans, which carries inherent risks. See the Safety Precautions section for important security guidelines.

Overview

Magentic-One uses a multi-agent architecture where a lead Orchestrator agent manages high-level planning, directs other agents, and tracks task progress. The system autonomously adapts to dynamic web and file-system environments to solve complex tasks.

Multi-Agent Architecture

Orchestrator coordinates specialized agents for different capabilities

Web & File Tasks

Handles open-ended tasks involving web browsing and file manipulation

Autonomous Adaptation

Dynamically adjusts plans based on task progress and obstacles

Competitive Performance

Achieves strong results on benchmarks like GAIA and HumanEval

Installation

Install required packages

pip install "autogen-agentchat" "autogen-ext[magentic-one,openai]"

Install Playwright (for MultimodalWebSurfer)

If you plan to use the web browsing agent:

playwright install --with-deps chromium

Quick Start

Get started with Magentic-One in just a few lines of code:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.teams.magentic_one import MagenticOne
from autogen_agentchat.ui import Console

async def main():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    m1 = MagenticOne(client=client)
    task = "What is the UV index in Melbourne today?"
    result = await Console(m1.run_stream(task=task))
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Architecture

Magentic-One consists of five specialized agents working together:

Orchestrator

The lead agent responsible for:

Task decomposition and planning
Directing other agents in executing subtasks
Tracking overall progress
Taking corrective actions when needed

The Orchestrator maintains two ledgers:

Task Ledger: High-level plan, facts, and educated guesses
Progress Ledger: Self-reflection on task progress at each step

WebSurfer

An LLM-based agent proficient in commanding a Chromium-based web browser:

Navigation: Visit URLs, perform web searches
Web Actions: Click elements, type text, fill forms
Reading: Summarize content, answer questions about pages

Uses accessibility tree and set-of-marks prompting for precise interactions.

FileSurfer

An LLM-based agent for file system operations:

Read local files of most types (via markdown preview)
List directory contents
Navigate folder structures
Extract information from documents

Coder

Specialized through its system prompt for:

Writing code to solve problems
Analyzing information from other agents
Creating new artifacts and tools
Implementing complex algorithms

ComputerTerminal

Provides access to a console shell:

Execute code written by the Coder
Install programming libraries
Run system commands
Interact with the file system

Usage Examples

Basic Usage with MagenticOne Helper

The simplest way to use Magentic-One with all agents:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.teams.magentic_one import MagenticOne
from autogen_agentchat.ui import Console

async def example_usage():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    m1 = MagenticOne(client=client)
    task = "Write a Python script to fetch data from an API."
    result = await Console(m1.run_stream(task=task))
    print(result)

if __name__ == "__main__":
    asyncio.run(example_usage())

Human-in-the-Loop Mode

Add human oversight for safety-critical tasks:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.teams.magentic_one import MagenticOne
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_agentchat.ui import Console
from autogen_agentchat.agents import ApprovalRequest, ApprovalResponse

def user_input_func(prompt: str) -> str:
    """Custom input function for user interaction."""
    return input(prompt)

def approval_func(request: ApprovalRequest) -> ApprovalResponse:
    """Request user approval before executing code."""
    print(f"Code to execute:\n{request.code}")
    user_input = input("Do you approve this code execution? (y/n): ").strip().lower()
    if user_input == 'y':
        return ApprovalResponse(approved=True, reason="User approved")
    else:
        return ApprovalResponse(approved=False, reason="User denied")

async def example_usage_hil():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    
    # Use Docker executor for better security
    async with DockerCommandLineCodeExecutor() as code_executor:
        m1 = MagenticOne(
            client=client,
            hil_mode=True,
            input_func=user_input_func,
            code_executor=code_executor,
            approval_func=approval_func
        )
        task = "Write a Python script to fetch data from an API."
        result = await Console(m1.run_stream(task=task))
        print(result)

if __name__ == "__main__":
    asyncio.run(example_usage_hil())

Code Approval Without Full HIL Mode

Approve only code execution while keeping the system autonomous:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.teams.magentic_one import MagenticOne
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor
from autogen_agentchat.ui import Console
from autogen_agentchat.agents import ApprovalRequest, ApprovalResponse

def approval_func(request: ApprovalRequest) -> ApprovalResponse:
    """Request user approval before executing code."""
    print(f"Code to execute:\n{request.code}")
    user_input = input("Approve? (y/n): ").strip().lower()
    if user_input == 'y':
        return ApprovalResponse(approved=True, reason="User approved")
    return ApprovalResponse(approved=False, reason="User denied")

async def example_usage_with_approval():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    
    async with DockerCommandLineCodeExecutor() as code_executor:
        m1 = MagenticOne(
            client=client,
            hil_mode=False,  # No human intervention in conversation
            code_executor=code_executor,
            approval_func=approval_func  # But approve code execution
        )
        task = "Write a Python script to fetch data from an API."
        result = await Console(m1.run_stream(task=task))
        print(result)

if __name__ == "__main__":
    asyncio.run(example_usage_with_approval())

Using MagenticOneGroupChat

For more control, use MagenticOneGroupChat directly:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")

    assistant = AssistantAgent(
        "Assistant",
        model_client=model_client,
    )
    team = MagenticOneGroupChat([assistant], model_client=model_client)
    await Console(team.run_stream(task="Provide a proof for Fermat's Last Theorem"))
    await model_client.close()

asyncio.run(main())

Using Individual Magentic-One Agents

Combine specific agents in a custom team:

import asyncio
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.teams import MagenticOneGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.agents.file_surfer import FileSurfer
from autogen_ext.agents.magentic_one import MagenticOneCoderAgent
from autogen_agentchat.agents import CodeExecutorAgent
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")

    surfer = MultimodalWebSurfer("WebSurfer", model_client=model_client)
    file_surfer = FileSurfer("FileSurfer", model_client=model_client)
    coder = MagenticOneCoderAgent("Coder", model_client=model_client)
    terminal = CodeExecutorAgent(
        "ComputerTerminal",
        code_executor=LocalCommandLineCodeExecutor()
    )

    team = MagenticOneGroupChat(
        [surfer, file_surfer, coder, terminal],
        model_client=model_client
    )
    
    await Console(team.run_stream(task="What is the UV index in Melbourne today?"))

asyncio.run(main())

Safety Precautions

Magentic-One interacts with real web pages, executes code, and accesses files. Always follow these safety guidelines:

1. Use Containers

Run all tasks in Docker containers to isolate the agents and prevent direct system attacks.

from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

async with DockerCommandLineCodeExecutor() as code_executor:
    m1 = MagenticOne(client=client, code_executor=code_executor)

2. Virtual Environment

Use a virtual environment to prevent agents from accessing sensitive data or system files.

3. Monitor Logs

Closely monitor logs during and after execution to detect and mitigate risky behavior.

4. Human Oversight

Run examples with a human in the loop to supervise agents and prevent unintended consequences.

m1 = MagenticOne(
    client=client,
    hil_mode=True,
    approval_func=approval_func
)

5. Limit Access

Restrict agents’ access to the internet and other resources to prevent unauthorized actions.

6. Safeguard Data

Ensure agents do not have access to sensitive data or resources. Never share sensitive information with the agents.

Be aware that agents may occasionally attempt risky actions, such as:

Recruiting humans for help
Accepting cookie agreements without human involvement
Following instructions from compromised web pages (prompt injection)

Always ensure agents are monitored and operate within a controlled environment.

Model Recommendations

Magentic-One is model-agnostic and can work with various LLMs:

GPT-4o (Recommended)

Default multimodal LLM for all agents. Strong reasoning and vision capabilities.

GPT-4o for Orchestrator

Use a strong reasoning model for the Orchestrator agent.

OpenAI o1-preview

For advanced reasoning in Orchestrator outer loop and Coder agent.

Heterogeneous Models

Mix different models for different agents to balance cost and capabilities.

Azure OpenAI Example

from autogen_ext.models.openai import AzureOpenAIChatCompletionClient

client = AzureOpenAIChatCompletionClient(
    azure_endpoint="https://your-endpoint.openai.azure.com/",
    api_version="2024-02-15-preview",
    model="gpt-4o",
    api_key="your-api-key"
)

m1 = MagenticOne(client=client)

Performance

Magentic-One achieves competitive results on multiple benchmarks:

GAIA: Strong performance on general AI assistant tasks
HumanEval: Effective code generation capabilities
AssistantBench: Competitive across diverse assistant scenarios

See the technical report for detailed benchmark results.

Orchestrator Workflow

The Orchestrator uses a two-loop architecture:

Outer Loop (Task Ledger)

Create initial plan for the task
Gather facts and educated guesses
Update plan if progress stalls

Inner Loop (Progress Ledger)

Self-reflect on current progress
Check if task is completed
Assign subtask to appropriate agent
Update progress after agent completes subtask
Repeat until task is complete or replanning is needed

This architecture allows Magentic-One to:

Dynamically adapt to obstacles
Recover from failures
Optimize agent selection based on subtask requirements

API Reference

MagenticOne

client

ChatCompletionClient

required

The client used for model interactions (e.g., OpenAIChatCompletionClient)

hil_mode

bool

default:"false"

If True, adds UserProxyAgent to enable human-in-the-loop interactions

input_func

InputFuncType

default:"None"

Function to use for user input in human-in-the-loop mode

code_executor

CodeExecutor

default:"None"

Code executor to use. If None, will use Docker if available, otherwise local executor.

approval_func

ApprovalFuncType

default:"None"

Function to approve code execution before running. If None, code executes without approval.

Resources

Blog Post

Read the official Magentic-One announcement

Technical Report

Full academic paper with detailed methodology

GitHub Repository

View source code and contribute

API Reference

Complete API documentation

Citation

If you use Magentic-One in your research, please cite:

@misc{fourney2024magenticonegeneralistmultiagentsolving,
    title={Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks},
    author={Adam Fourney and Gagan Bansal and Hussein Mozannar and Cheng Tan and Eduardo Salinas and Erkang Zhu and Friederike Niedtner and Grace Proebsting and Griffin Bassman and Jack Gerrits and Jacob Alber and Peter Chang and Ricky Loynd and Robert West and Victor Dibia and Ahmed Awadallah and Ece Kamar and Rafah Hosn and Saleema Amershi},
    year={2024},
    eprint={2411.04468},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2411.04468}
}

Getting Started

AgentChat

Core API

Extensions

Developer Tools

Guides

​Overview

Multi-Agent Architecture

Web & File Tasks

Autonomous Adaptation

Competitive Performance

​Installation

​Quick Start

​Architecture

​Orchestrator

​WebSurfer

​FileSurfer

​Coder

​ComputerTerminal

​Usage Examples

​Basic Usage with MagenticOne Helper

​Human-in-the-Loop Mode

​Code Approval Without Full HIL Mode

​Using MagenticOneGroupChat

​Using Individual Magentic-One Agents

​Safety Precautions

​Model Recommendations

GPT-4o (Recommended)

GPT-4o for Orchestrator

OpenAI o1-preview

Heterogeneous Models

​Azure OpenAI Example

​Performance

​Orchestrator Workflow

​Outer Loop (Task Ledger)

​Inner Loop (Progress Ledger)

​API Reference

​MagenticOne

​Resources

Blog Post

Technical Report

GitHub Repository

API Reference

​Citation

Build docs developers (and LLMs) love

Overview

Installation

Quick Start

Architecture

Orchestrator

WebSurfer

FileSurfer

Coder

ComputerTerminal

Usage Examples

Basic Usage with MagenticOne Helper

Human-in-the-Loop Mode

Code Approval Without Full HIL Mode

Using MagenticOneGroupChat

Using Individual Magentic-One Agents

Safety Precautions

Model Recommendations

Azure OpenAI Example

Performance

Orchestrator Workflow

Outer Loop (Task Ledger)

Inner Loop (Progress Ledger)

API Reference

MagenticOne

Resources

Citation