Skip to main content

API Keys

HyperAgents uses LiteLLM to route requests to multiple LLM providers. You need API keys for whichever providers you plan to use. Create a .env file in the repository root:
.env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
All three keys are optional — only the key for the provider you actually use needs to be set. However, the default model is openai/gpt-4o, so OPENAI_API_KEY is required unless you override --model.

System Requirements

HyperAgents requires Python 3.12 and several system-level packages. These instructions are for Fedora/RHEL-based systems using dnf.
# Python 3.12 development headers
sudo dnf install -y python3.12-devel

# Build tools and domain-specific dependencies
sudo dnf install -y graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel
HyperAgents executes model-generated code inside Docker containers. Docker must be installed and the current user must have permission to run Docker commands before proceeding.

Python Environment

# Create and activate a virtual environment
python3.12 -m venv venv_nat
source venv_nat/bin/activate

# Install runtime and dev dependencies
pip install -r requirements.txt
pip install -r requirements_dev.txt

# Build the Docker image used for sandboxed evaluation
docker build --network=host -t hyperagents .

# Initialize the starting agent checkpoints
bash ./setup_initial.sh

Python Dependencies

requirements.txt — Runtime

PackageVersionPurpose
requests2.32.4HTTP client for API calls
dotenv0.9.9Loads .env file into environment variables
tqdm4.67.1Progress bars for evaluation loops
backoff2.2.1Exponential retry on LLM API failures
matplotlib3.10.3Plotting generation progress
docker7.1.0Python SDK for managing Docker containers
datasets3.6.0HuggingFace datasets (used by several domains)
GitPython3.1.44Git operations for patch management
litellm1.74.9Unified LLM API across OpenAI, Anthropic, Gemini
pandas2.3.2Result aggregation and analysis
sympy1.14.0Symbolic math (used by imo_grading / imo_proof)
hydra-core1.3.2Config management for Balrog domains
gym / gymnasium0.23.0 / 1.2.0RL environment interfaces for Balrog
rsl-rl-lib2.2.4Reinforcement learning training for Genesis domain
tensorboard2.20.0Training metrics logging for Genesis
GenesisgitPhysics simulation engine for robotics domain
MinigridgitGrid-world environments for balrog_babyai
minihackgitNetHack-based environments for balrog_minihack
baba-is-aigitBaba Is You environment for balrog_babaisai

requirements_dev.txt — Analysis & Visualization

PackageVersionPurpose
networkx3.5Archive graph construction and traversal
pygraphviz1.14Rendering archive lineage graphs (requires graphviz system package)
plotly6.1.2Interactive HTML plots
scikit-learn1.7.0Score-proportional parent selection utilities
requirements_dev.txt packages are only needed if you intend to run the analysis scripts in analysis/. They are not required for running generate_loop.py.

Supported LLM Models

Models are defined as constants in agent/llm.py and passed to LiteLLM. The string format is provider/model-id.
ConstantModel IdentifierProvider
CLAUDE_MODELanthropic/claude-sonnet-4-5-20250929Anthropic
CLAUDE_HAIKU_MODELanthropic/claude-3-haiku-20240307Anthropic (4096 token limit)
CLAUDE_35NEW_MODELanthropic/claude-3-5-sonnet-20241022Anthropic
OPENAI_MODELopenai/gpt-4oOpenAI (default)
OPENAI_MINI_MODELopenai/gpt-4o-miniOpenAI
OPENAI_O3_MODELopenai/o3OpenAI
OPENAI_O3MINI_MODELopenai/o3-miniOpenAI
OPENAI_O4MINI_MODELopenai/o4-miniOpenAI
OPENAI_GPT52_MODELopenai/gpt-5.2OpenAI
OPENAI_GPT5_MODELopenai/gpt-5OpenAI (no temperature param)
OPENAI_GPT5MINI_MODELopenai/gpt-5-miniOpenAI (no temperature param)
GEMINI_3_MODELgemini/gemini-3-pro-previewGoogle
GEMINI_MODELgemini/gemini-2.5-proGoogle
GEMINI_FLASH_MODELgemini/gemini-2.5-flashGoogle
gpt-5 and gpt-5-mini do not accept a temperature parameter — LiteLLM will skip it automatically. All GPT-5 family models use max_completion_tokens instead of max_tokens. Claude Haiku is capped at 4096 output tokens regardless of the global MAX_TOKENS setting.

Setting the Model

The meta-agent model is configured via the --model argument in run_meta_agent.py. Pass the full LiteLLM model identifier:
python run_meta_agent.py --model anthropic/claude-3-5-sonnet-20241022 ...
When running via generate_loop.py, the model argument is forwarded automatically. For the polyglot domain specifically, the loop hardcodes claude-3-5-sonnet-20241022 for fair comparison with the DGM baseline. The global token limit is 16,384 (MAX_TOKENS in agent/llm.py). Failed API calls are retried with exponential backoff for up to 600 seconds, with a maximum interval of 60 seconds between retries.

Build docs developers (and LLMs) love