API Keys
HyperAgents uses LiteLLM to route requests to multiple LLM providers. You need API keys for whichever providers you plan to use.
Create a .env file in the repository root:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
All three keys are optional — only the key for the provider you actually use needs to be set. However, the default model is openai/gpt-4o, so OPENAI_API_KEY is required unless you override --model.
System Requirements
HyperAgents requires Python 3.12 and several system-level packages. These instructions are for Fedora/RHEL-based systems using dnf.
# Python 3.12 development headers
sudo dnf install -y python3.12-devel
# Build tools and domain-specific dependencies
sudo dnf install -y graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel
HyperAgents executes model-generated code inside Docker containers. Docker must be installed and the current user must have permission to run Docker commands before proceeding.
Python Environment
# Create and activate a virtual environment
python3.12 -m venv venv_nat
source venv_nat/bin/activate
# Install runtime and dev dependencies
pip install -r requirements.txt
pip install -r requirements_dev.txt
# Build the Docker image used for sandboxed evaluation
docker build --network=host -t hyperagents .
# Initialize the starting agent checkpoints
bash ./setup_initial.sh
Python Dependencies
requirements.txt — Runtime
| Package | Version | Purpose |
|---|
requests | 2.32.4 | HTTP client for API calls |
dotenv | 0.9.9 | Loads .env file into environment variables |
tqdm | 4.67.1 | Progress bars for evaluation loops |
backoff | 2.2.1 | Exponential retry on LLM API failures |
matplotlib | 3.10.3 | Plotting generation progress |
docker | 7.1.0 | Python SDK for managing Docker containers |
datasets | 3.6.0 | HuggingFace datasets (used by several domains) |
GitPython | 3.1.44 | Git operations for patch management |
litellm | 1.74.9 | Unified LLM API across OpenAI, Anthropic, Gemini |
pandas | 2.3.2 | Result aggregation and analysis |
sympy | 1.14.0 | Symbolic math (used by imo_grading / imo_proof) |
hydra-core | 1.3.2 | Config management for Balrog domains |
gym / gymnasium | 0.23.0 / 1.2.0 | RL environment interfaces for Balrog |
rsl-rl-lib | 2.2.4 | Reinforcement learning training for Genesis domain |
tensorboard | 2.20.0 | Training metrics logging for Genesis |
Genesis | git | Physics simulation engine for robotics domain |
Minigrid | git | Grid-world environments for balrog_babyai |
minihack | git | NetHack-based environments for balrog_minihack |
baba-is-ai | git | Baba Is You environment for balrog_babaisai |
requirements_dev.txt — Analysis & Visualization
| Package | Version | Purpose |
|---|
networkx | 3.5 | Archive graph construction and traversal |
pygraphviz | 1.14 | Rendering archive lineage graphs (requires graphviz system package) |
plotly | 6.1.2 | Interactive HTML plots |
scikit-learn | 1.7.0 | Score-proportional parent selection utilities |
requirements_dev.txt packages are only needed if you intend to run the analysis scripts in analysis/. They are not required for running generate_loop.py.
Supported LLM Models
Models are defined as constants in agent/llm.py and passed to LiteLLM. The string format is provider/model-id.
| Constant | Model Identifier | Provider |
|---|
CLAUDE_MODEL | anthropic/claude-sonnet-4-5-20250929 | Anthropic |
CLAUDE_HAIKU_MODEL | anthropic/claude-3-haiku-20240307 | Anthropic (4096 token limit) |
CLAUDE_35NEW_MODEL | anthropic/claude-3-5-sonnet-20241022 | Anthropic |
OPENAI_MODEL | openai/gpt-4o | OpenAI (default) |
OPENAI_MINI_MODEL | openai/gpt-4o-mini | OpenAI |
OPENAI_O3_MODEL | openai/o3 | OpenAI |
OPENAI_O3MINI_MODEL | openai/o3-mini | OpenAI |
OPENAI_O4MINI_MODEL | openai/o4-mini | OpenAI |
OPENAI_GPT52_MODEL | openai/gpt-5.2 | OpenAI |
OPENAI_GPT5_MODEL | openai/gpt-5 | OpenAI (no temperature param) |
OPENAI_GPT5MINI_MODEL | openai/gpt-5-mini | OpenAI (no temperature param) |
GEMINI_3_MODEL | gemini/gemini-3-pro-preview | Google |
GEMINI_MODEL | gemini/gemini-2.5-pro | Google |
GEMINI_FLASH_MODEL | gemini/gemini-2.5-flash | Google |
gpt-5 and gpt-5-mini do not accept a temperature parameter — LiteLLM will skip it automatically. All GPT-5 family models use max_completion_tokens instead of max_tokens. Claude Haiku is capped at 4096 output tokens regardless of the global MAX_TOKENS setting.
Setting the Model
The meta-agent model is configured via the --model argument in run_meta_agent.py. Pass the full LiteLLM model identifier:
python run_meta_agent.py --model anthropic/claude-3-5-sonnet-20241022 ...
When running via generate_loop.py, the model argument is forwarded automatically. For the polyglot domain specifically, the loop hardcodes claude-3-5-sonnet-20241022 for fair comparison with the DGM baseline.
The global token limit is 16,384 (MAX_TOKENS in agent/llm.py). Failed API calls are retried with exponential backoff for up to 600 seconds, with a maximum interval of 60 seconds between retries.