Quickstart
Install dependencies, configure API keys, and run your first evolution loop in minutes
Core Concepts
Understand the meta-agent, task-agent, and evolutionary archive design
Domains
Explore the supported benchmark domains: paper review, games, robotics, math, and coding
API Reference
Full reference for AgentSystem, MetaAgent, TaskAgent, and LLM utilities
How it works
Set up your environment
Configure API keys for OpenAI, Anthropic, or Gemini models. Install Python dependencies and build the Docker container used for sandboxed evaluation.
Initialize agents
Run
setup_initial.sh to bootstrap initial agent evaluations on your chosen domain. This creates the baseline scores that the evolution loop will try to beat.Run the evolution loop
Launch
generate_loop.py with your target domain. The meta-agent proposes code changes, the modified task agent is evaluated in Docker, and the archive is updated with the new generation.Key capabilities
Self-referential improvement
The meta-agent can modify any part of the codebase — including its own agent logic — enabling truly open-ended self-improvement
Multi-domain benchmarks
Evaluate across paper review, web search, BALROG games, Genesis robotics, IMO math, and SWE-bench coding
Safe sandboxed execution
All model-generated code runs inside isolated Docker containers to prevent accidental system damage
Archive-based selection
Maintains a quality-diversity archive of all evolved agents; configurable parent selection strategies determine which lineage to evolve next
Multi-LLM support
Plug in GPT-4o, Claude, Gemini, or any LiteLLM-compatible model as the backbone for both meta and task agents
Ensemble scoring
Combine predictions from multiple evolved agents for stronger performance beyond any single agent generation