Skip to main content

Overview

utils/gl_utils.py contains the helper functions that power the HyperAgents evolutionary loop. The loop iteratively generates new agent variants, evaluates them, and selects parents for the next generation based on scores stored in report.json files. Key concepts:
  • genid — a generation identifier. It is either the string "initial" or an integer >= 0.
  • output_dir — root directory for a run. Each generation lives in {output_dir}/gen_{genid}/.
  • archive — an ordered list of genids that have been evaluated, persisted as a JSONL file at {output_dir}/archive.jsonl.
  • domain — a benchmark domain string such as "search_arena", "balrog_babyai", "genesis_locomotion", or "polyglot".

Scoring

get_score

Reads the raw score for a single generation from its report.json file.
from utils.gl_utils import get_score

score = get_score("search_arena", "/runs/exp1", genid=3, split="val")

Signature

def get_score(domain: str, output_dir: str, genid, split: str = "train") -> float | None

Parameters

domain
str
required
Domain name string. Used to build the eval directory name and to look up the score key in report.json.
output_dir
str
required
Path to the run’s root output directory.
genid
str | int
required
Generation identifier — "initial" or an integer.
split
str
default:"train"
Evaluation split. "train" maps to {domain}_eval/, any other value (e.g. "val", "test") maps to {domain}_eval_{split}/.

Return Value

score
float | None
The numeric score read from report.json under the key returned by get_domain_score_key(domain). Returns None if the file is missing, the key is absent, or the value is NaN.Special cases:
  • Balrog domains: score is divided by 100.0 (normalised to [0, 1]). Returns None if no environments ran.
  • Polyglot domains: returns None if no tasks completed without error.

get_saved_score

Returns a score adjusted for whether a full or staged evaluation was run, and optionally returns the ensemble or max of agent and ensemble scores.
from utils.gl_utils import get_saved_score

score = get_saved_score("search_arena", "/runs/exp1", genid=3, split="val", type="max")

Signature

def get_saved_score(
    domain: str,
    output_dir: str,
    genid,
    split: str = "train",
    type: str = "agent",
) -> float | None

Parameters

domain
str
required
Domain name string.
output_dir
str
required
Path to the run’s root output directory.
genid
str | int
required
Generation identifier.
split
str
default:"train"
Evaluation split.
type
'agent' | 'ensemble' | 'max'
default:"agent"
Which score to return:
  • "agent" — the direct agent score from report.json.
  • "ensemble" — the ensemble score from report_ensemble_{domain}_{split}.json.
  • "max" — the higher of agent and ensemble scores.

Return Value

score
float | None
The requested score. If the generation’s metadata does not have run_full_eval: true (or the genid is "initial"), the raw score is multiplied by get_domain_stagedeval_frac(domain) to make it comparable to full-eval scores.

Archive Management

update_and_save_archive

Appends a new generation to the in-memory archive list and persists a JSONL snapshot to disk.
from utils.gl_utils import update_and_save_archive

archive = update_and_save_archive("/runs/exp1", archive=["initial", 0, 1], new_node=2)

Signature

def update_and_save_archive(output_dir: str, archive: list, new_node) -> list

Parameters

output_dir
str
required
Run root directory. The archive is written to {output_dir}/archive.jsonl.
archive
list
required
Current archive (list of genids). The new node is appended in place.
new_node
str | int
required
The genid of the newly evaluated generation to append.

Return Value

archive
list
The updated archive list (same object as the input, with new_node appended). Each line appended to archive.jsonl has the structure:
{"current_genid": <new_node>, "archive": [<all genids so far>]}

load_archive_data

Loads one or all snapshots from an archive.jsonl file.
from utils.gl_utils import load_archive_data

# Latest snapshot only (default)
data = load_archive_data("/runs/exp1/archive.jsonl")
print(data["archive"])  # list of all genids

# All snapshots
all_data = load_archive_data("/runs/exp1/archive.jsonl", last_only=False)

Signature

def load_archive_data(filepath: str, last_only: bool = True) -> dict | list[dict]

Parameters

filepath
str
required
Absolute path to the archive.jsonl file.
last_only
bool
default:"true"
When True, returns only the last JSON entry (the most recent archive snapshot). When False, returns all entries as a list.

Return Value

archive_data
dict | list[dict]
When last_only=True, a single dict {"current_genid": ..., "archive": [...]}. When last_only=False, a list of such dicts, one per line in the file. Raises FileNotFoundError if the file does not exist.

Parent Selection

select_parent

Chooses the next parent generation from the current archive using one of several scoring-based strategies.
from utils.gl_utils import select_parent

parent_genid = select_parent(
    archive=["initial", 0, 1, 2],
    output_dir="/runs/exp1",
    domains=["search_arena"],
    method="best",
)

Signature

def select_parent(
    archive: list,
    output_dir: str,
    domains: list[str],
    method: str = "best",
) -> str | int

Parameters

archive
list
required
Ordered list of all genids in the current run.
output_dir
str
required
Run root directory used to locate metadata and score files.
domains
list[str]
required
List of domain names. Each domain’s score is read for the best available split ("val" if supported, otherwise "train"). The per-domain scores are averaged to produce a single candidate score.
method
str
default:"best"
Parent selection strategy:
ValueBehaviour
"random"Uniform random choice from valid candidates
"latest"Most recently added valid candidate
"best"Candidate with the highest average score
"score_prop"Probabilistic, weighted by sigmoid-normalised scores
"score_child_prop"Like score_prop but down-weights nodes with many children

Return Value

genid
str | int
The selected parent’s generation identifier. Only generations with all domain scores available and valid_parent: true in their metadata are eligible. If no eligible candidates exist, falls back to the first node in the archive with score 0.0.

Container Patch Application

apply_diffs_container

Applies a sequence of git patch files to a running Docker container, staging and committing the result.
from utils.gl_utils import apply_diffs_container

commit_hash = apply_diffs_container(container, patch_files=["/tmp/parent.diff"])

Signature

def apply_diffs_container(
    container,
    patch_files: list[str],
    repo_name: str = REPO_NAME,
    verbose: bool = True,
) -> str

Parameters

container
docker.models.containers.Container
required
A running Docker container object obtained from docker.from_env().containers.get(...).
patch_files
list[str]
required
Ordered list of absolute local paths to .diff patch files. Applied sequentially via patch -p1.
repo_name
str
default:"hyperagents"
Name of the repository directory inside the container (the working directory for all git operations).
verbose
bool
default:"true"
Whether to log container exec output via the thread-local logger.

Return Value

commit_hash
str
The git commit hash after applying all patches. If any patches introduced changes a new commit is created; otherwise the existing HEAD hash is returned.Changes to domains/ are automatically filtered out of each patch before it is applied to avoid contaminating the container’s domain configuration.

get_patch_files

Returns the full ordered list of patch files needed to reconstruct a generation’s lineage from the initial state.
from utils.gl_utils import get_patch_files

patches = get_patch_files("/runs/exp1", genid=3)
# ['/runs/exp1/gen_0/patches/model_patch.diff',
#  '/runs/exp1/gen_1/patches/model_patch.diff',
#  '/runs/exp1/gen_3/patches/model_patch.diff']

Signature

def get_patch_files(output_dir: str, genid) -> list[str]

Parameters

output_dir
str
required
Run root directory.
genid
str | int
required
Generation whose full patch lineage to retrieve.

Return Value

patch_files
list[str]
Combined list of prev_patch_files (ancestor patches) and curr_patch_files (this generation’s patches) read from {output_dir}/gen_{genid}/metadata.json. Returns an empty list if the metadata file does not exist.

Build docs developers (and LLMs) love