Skip to main content
HyperAgents is a research framework from Meta that implements a self-referential self-improvement loop: a meta-agent iteratively modifies the codebase — including its own agent implementations — to improve performance across diverse benchmark domains. Each generation, the system evaluates the evolved agents, selects the best-performing lineage, and uses it as the starting point for the next round of improvement.

Quickstart

Install dependencies, configure API keys, and run your first evolution loop in minutes

Core Concepts

Understand the meta-agent, task-agent, and evolutionary archive design

Domains

Explore the supported benchmark domains: paper review, games, robotics, math, and coding

API Reference

Full reference for AgentSystem, MetaAgent, TaskAgent, and LLM utilities

How it works

1

Set up your environment

Configure API keys for OpenAI, Anthropic, or Gemini models. Install Python dependencies and build the Docker container used for sandboxed evaluation.
2

Initialize agents

Run setup_initial.sh to bootstrap initial agent evaluations on your chosen domain. This creates the baseline scores that the evolution loop will try to beat.
3

Run the evolution loop

Launch generate_loop.py with your target domain. The meta-agent proposes code changes, the modified task agent is evaluated in Docker, and the archive is updated with the new generation.
4

Analyze results

Inspect outputs in the outputs/ directory. Use the built-in plotting utilities to visualize score progression and the evolutionary archive tree across generations.

Key capabilities

Self-referential improvement

The meta-agent can modify any part of the codebase — including its own agent logic — enabling truly open-ended self-improvement

Multi-domain benchmarks

Evaluate across paper review, web search, BALROG games, Genesis robotics, IMO math, and SWE-bench coding

Safe sandboxed execution

All model-generated code runs inside isolated Docker containers to prevent accidental system damage

Archive-based selection

Maintains a quality-diversity archive of all evolved agents; configurable parent selection strategies determine which lineage to evolve next

Multi-LLM support

Plug in GPT-4o, Claude, Gemini, or any LiteLLM-compatible model as the backbone for both meta and task agents

Ensemble scoring

Combine predictions from multiple evolved agents for stronger performance beyond any single agent generation
HyperAgents executes untrusted, model-generated code. Always run it inside the provided Docker sandbox. See the Safety page for important guidance before running any experiments.

Build docs developers (and LLMs) love