The Two-Agent Hierarchy
Two Python classes carry the load:| Class | File | Responsibility |
|---|---|---|
MetaAgent | meta_agent.py | Proposes code changes to the entire repository |
TaskAgent | task_agent.py | Solves individual domain tasks and returns predictions |
AgentSystem (agent/base_agent.py) and share the same chat_with_agent call-loop (agent/llm_withtools.py). The key difference is what they receive as input and which tools they are given:
MetaAgent.forward()getsrepo_path+eval_pathand runs withtools_available='all'(bash + editor).TaskAgent.forward()gets a domaininputsdict and runs with no tools by default — the meta-agent is expected to change that over time.
The Evolutionary Archive
Every generation that produces a runnable diff is registered inoutputs/<run_id>/archive.jsonl. Each line is a JSON snapshot:
archive.jsonl
update_and_save_archive() in utils/gl_utils.py is called after every generate() call. Each generation node also has a gen_<id>/metadata.json file that records:
parent_genid— which node this generation branched fromprev_patch_files— cumulative lineage of.difffilescurr_patch_files— the diff produced in this generationrun_eval/run_full_eval— whether evaluation was executedvalid_parent— whether this node can be selected as a future parent
gen_<id>/<domain>_eval/report.json and read back by get_score() during parent selection.
Docker Sandboxing
Everygenerate() call spins up a fresh Docker container:
generate_loop.py (generate)
- Applies all accumulated
.diffpatch files from the parent’s lineage (apply_diffs_container). - Runs
run_meta_agent.py(6-hour timeout) to let the meta-agent edit the code. - Runs
domains/harness.py(5-hour timeout per domain) to evaluate the modifiedtask_agent.py. - Copies outputs back to the host with
copy_from_container. - Runs
git reset --hardandgit clean -fdto restore the container to a pristine state before teardown.
How It All Fits Together
Snapshot the repo
setup_initial_gen copies the repo into gen_initial/<REPO_NAME>, strips excluded directories (outputs, analysis, baselines), commits it, and records the commit hash as root_commit.Meta-agent proposes changes
Inside a Docker container,
MetaAgent.forward(repo_path, eval_path) runs an LLM with bash and editor tools. It inspects previous scores, reads the codebase, and emits a git diff saved as model_patch.diff.Task-agent is evaluated
The patched repo is handed to
domains/harness.py, which loads TaskAgent via load_task_agent() and runs it in parallel across the evaluation dataset. Results are written to report.json.Archive is updated
update_and_save_archive() appends the new generation ID to archive.jsonl. The generation’s score and metadata are stored in gen_<id>/.Directory Layout
Meta-Agent
How the meta-agent reads the archive and edits the codebase.
Task-Agent
How the task-agent processes domain inputs and returns predictions.
Evolution Loop
Parameter reference and parent-selection strategies.