Overview
utils/gl_utils.py contains the helper functions that power the HyperAgents evolutionary loop. The loop iteratively generates new agent variants, evaluates them, and selects parents for the next generation based on scores stored in report.json files.
Key concepts:
genid— a generation identifier. It is either the string"initial"or an integer>= 0.output_dir— root directory for a run. Each generation lives in{output_dir}/gen_{genid}/.archive— an ordered list of genids that have been evaluated, persisted as a JSONL file at{output_dir}/archive.jsonl.domain— a benchmark domain string such as"search_arena","balrog_babyai","genesis_locomotion", or"polyglot".
Scoring
get_score
Reads the raw score for a single generation from its report.json file.
Signature
Parameters
Domain name string. Used to build the eval directory name and to look up the score key in
report.json.Path to the run’s root output directory.
Generation identifier —
"initial" or an integer.Evaluation split.
"train" maps to {domain}_eval/, any other value (e.g. "val", "test") maps to {domain}_eval_{split}/.Return Value
The numeric score read from
report.json under the key returned by get_domain_score_key(domain). Returns None if the file is missing, the key is absent, or the value is NaN.Special cases:- Balrog domains: score is divided by
100.0(normalised to[0, 1]). ReturnsNoneif no environments ran. - Polyglot domains: returns
Noneif no tasks completed without error.
get_saved_score
Returns a score adjusted for whether a full or staged evaluation was run, and optionally returns the ensemble or max of agent and ensemble scores.
Signature
Parameters
Domain name string.
Path to the run’s root output directory.
Generation identifier.
Evaluation split.
Which score to return:
"agent"— the direct agent score fromreport.json."ensemble"— the ensemble score fromreport_ensemble_{domain}_{split}.json."max"— the higher of agent and ensemble scores.
Return Value
The requested score. If the generation’s metadata does not have
run_full_eval: true (or the genid is "initial"), the raw score is multiplied by get_domain_stagedeval_frac(domain) to make it comparable to full-eval scores.Archive Management
update_and_save_archive
Appends a new generation to the in-memory archive list and persists a JSONL snapshot to disk.
Signature
Parameters
Run root directory. The archive is written to
{output_dir}/archive.jsonl.Current archive (list of genids). The new node is appended in place.
The genid of the newly evaluated generation to append.
Return Value
The updated archive list (same object as the input, with
new_node appended). Each line appended to archive.jsonl has the structure:load_archive_data
Loads one or all snapshots from an archive.jsonl file.
Signature
Parameters
Absolute path to the
archive.jsonl file.When
True, returns only the last JSON entry (the most recent archive snapshot). When False, returns all entries as a list.Return Value
When
last_only=True, a single dict {"current_genid": ..., "archive": [...]}. When last_only=False, a list of such dicts, one per line in the file. Raises FileNotFoundError if the file does not exist.Parent Selection
select_parent
Chooses the next parent generation from the current archive using one of several scoring-based strategies.
Signature
Parameters
Ordered list of all genids in the current run.
Run root directory used to locate metadata and score files.
List of domain names. Each domain’s score is read for the best available split (
"val" if supported, otherwise "train"). The per-domain scores are averaged to produce a single candidate score.Parent selection strategy:
| Value | Behaviour |
|---|---|
"random" | Uniform random choice from valid candidates |
"latest" | Most recently added valid candidate |
"best" | Candidate with the highest average score |
"score_prop" | Probabilistic, weighted by sigmoid-normalised scores |
"score_child_prop" | Like score_prop but down-weights nodes with many children |
Return Value
The selected parent’s generation identifier. Only generations with all domain scores available and
valid_parent: true in their metadata are eligible. If no eligible candidates exist, falls back to the first node in the archive with score 0.0.Container Patch Application
apply_diffs_container
Applies a sequence of git patch files to a running Docker container, staging and committing the result.
Signature
Parameters
A running Docker container object obtained from
docker.from_env().containers.get(...).Ordered list of absolute local paths to
.diff patch files. Applied sequentially via patch -p1.Name of the repository directory inside the container (the working directory for all git operations).
Whether to log container exec output via the thread-local logger.
Return Value
The git commit hash after applying all patches. If any patches introduced changes a new commit is created; otherwise the existing
HEAD hash is returned.Changes to domains/ are automatically filtered out of each patch before it is applied to avoid contaminating the container’s domain configuration.get_patch_files
Returns the full ordered list of patch files needed to reconstruct a generation’s lineage from the initial state.
Signature
Parameters
Run root directory.
Generation whose full patch lineage to retrieve.
Return Value
Combined list of
prev_patch_files (ancestor patches) and curr_patch_files (this generation’s patches) read from {output_dir}/gen_{genid}/metadata.json. Returns an empty list if the metadata file does not exist.