Generate Loop Utilities

Overview

utils/gl_utils.py contains the helper functions that power the HyperAgents evolutionary loop. The loop iteratively generates new agent variants, evaluates them, and selects parents for the next generation based on scores stored in report.json files. Key concepts:

genid — a generation identifier. It is either the string "initial" or an integer >= 0.
output_dir — root directory for a run. Each generation lives in {output_dir}/gen_{genid}/.
archive — an ordered list of genids that have been evaluated, persisted as a JSONL file at {output_dir}/archive.jsonl.
domain — a benchmark domain string such as "search_arena", "balrog_babyai", "genesis_locomotion", or "polyglot".

Scoring

`get_score`

Reads the raw score for a single generation from its report.json file.

from utils.gl_utils import get_score

score = get_score("search_arena", "/runs/exp1", genid=3, split="val")

Signature

def get_score(domain: str, output_dir: str, genid, split: str = "train") -> float | None

Parameters

domain

str

required

Domain name string. Used to build the eval directory name and to look up the score key in report.json.

output_dir

str

required

Path to the run’s root output directory.

genid

str | int

required

Generation identifier — "initial" or an integer.

split

str

default:"train"

Evaluation split. "train" maps to {domain}_eval/, any other value (e.g. "val", "test") maps to {domain}_eval_{split}/.

Return Value

score

float | None

The numeric score read from report.json under the key returned by get_domain_score_key(domain). Returns None if the file is missing, the key is absent, or the value is NaN.Special cases:

Balrog domains: score is divided by 100.0 (normalised to [0, 1]). Returns None if no environments ran.
Polyglot domains: returns None if no tasks completed without error.

`get_saved_score`

Returns a score adjusted for whether a full or staged evaluation was run, and optionally returns the ensemble or max of agent and ensemble scores.

from utils.gl_utils import get_saved_score

score = get_saved_score("search_arena", "/runs/exp1", genid=3, split="val", type="max")

Signature

def get_saved_score(
    domain: str,
    output_dir: str,
    genid,
    split: str = "train",
    type: str = "agent",
) -> float | None

Parameters

domain

str

required

Domain name string.

output_dir

str

required

Path to the run’s root output directory.

genid

str | int

required

Generation identifier.

split

str

default:"train"

Evaluation split.

type

'agent' | 'ensemble' | 'max'

default:"agent"

Which score to return:

"agent" — the direct agent score from report.json.
"ensemble" — the ensemble score from report_ensemble_{domain}_{split}.json.
"max" — the higher of agent and ensemble scores.

Return Value

score

float | None

The requested score. If the generation’s metadata does not have run_full_eval: true (or the genid is "initial"), the raw score is multiplied by get_domain_stagedeval_frac(domain) to make it comparable to full-eval scores.

Archive Management

`update_and_save_archive`

Appends a new generation to the in-memory archive list and persists a JSONL snapshot to disk.

from utils.gl_utils import update_and_save_archive

archive = update_and_save_archive("/runs/exp1", archive=["initial", 0, 1], new_node=2)

Signature

def update_and_save_archive(output_dir: str, archive: list, new_node) -> list

Parameters

output_dir

str

required

Run root directory. The archive is written to {output_dir}/archive.jsonl.

Return Value

`load_archive_data`

Loads one or all snapshots from an archive.jsonl file.

from utils.gl_utils import load_archive_data

# Latest snapshot only (default)
data = load_archive_data("/runs/exp1/archive.jsonl")
print(data["archive"])  # list of all genids

# All snapshots
all_data = load_archive_data("/runs/exp1/archive.jsonl", last_only=False)

Signature

def load_archive_data(filepath: str, last_only: bool = True) -> dict | list[dict]

Parameters

filepath

str

required

Absolute path to the archive.jsonl file.

last_only

bool

default:"true"

When True, returns only the last JSON entry (the most recent archive snapshot). When False, returns all entries as a list.

Return Value

archive_data

dict | list[dict]

When last_only=True, a single dict {"current_genid": ..., "archive": [...]}. When last_only=False, a list of such dicts, one per line in the file. Raises FileNotFoundError if the file does not exist.

Parent Selection

`select_parent`

Chooses the next parent generation from the current archive using one of several scoring-based strategies.

from utils.gl_utils import select_parent

parent_genid = select_parent(
    archive=["initial", 0, 1, 2],
    output_dir="/runs/exp1",
    domains=["search_arena"],
    method="best",
)

Signature

def select_parent(
    archive: list,
    output_dir: str,
    domains: list[str],
    method: str = "best",
) -> str | int

Parameters

Value	Behaviour
`"random"`	Uniform random choice from valid candidates
`"latest"`	Most recently added valid candidate
`"best"`	Candidate with the highest average score
`"score_prop"`	Probabilistic, weighted by sigmoid-normalised scores
`"score_child_prop"`	Like `score_prop` but down-weights nodes with many children

Return Value

genid

str | int

The selected parent’s generation identifier. Only generations with all domain scores available and valid_parent: true in their metadata are eligible. If no eligible candidates exist, falls back to the first node in the archive with score 0.0.

Container Patch Application

`apply_diffs_container`

Applies a sequence of git patch files to a running Docker container, staging and committing the result.

from utils.gl_utils import apply_diffs_container

commit_hash = apply_diffs_container(container, patch_files=["/tmp/parent.diff"])

Signature

def apply_diffs_container(
    container,
    patch_files: list[str],
    repo_name: str = REPO_NAME,
    verbose: bool = True,
) -> str

Parameters

container

docker.models.containers.Container

required

A running Docker container object obtained from docker.from_env().containers.get(...).

patch_files

list[str]

required

Ordered list of absolute local paths to .diff patch files. Applied sequentially via patch -p1.

repo_name

str

default:"hyperagents"

Name of the repository directory inside the container (the working directory for all git operations).

verbose

bool

default:"true"

Whether to log container exec output via the thread-local logger.

Return Value

commit_hash

str

The git commit hash after applying all patches. If any patches introduced changes a new commit is created; otherwise the existing HEAD hash is returned.Changes to domains/ are automatically filtered out of each patch before it is applied to avoid contaminating the container’s domain configuration.

`get_patch_files`

Returns the full ordered list of patch files needed to reconstruct a generation’s lineage from the initial state.

from utils.gl_utils import get_patch_files

patches = get_patch_files("/runs/exp1", genid=3)
# ['/runs/exp1/gen_0/patches/model_patch.diff',
#  '/runs/exp1/gen_1/patches/model_patch.diff',
#  '/runs/exp1/gen_3/patches/model_patch.diff']

Signature

def get_patch_files(output_dir: str, genid) -> list[str]

Parameters

output_dir

str

required

Run root directory.

genid

str | int

required

Generation whose full patch lineage to retrieve.

Return Value

patch_files

list[str]

Combined list of prev_patch_files (ancestor patches) and curr_patch_files (this generation’s patches) read from {output_dir}/gen_{genid}/metadata.json. Returns an empty list if the metadata file does not exist.

Agents

LLM & Tools

Utilities

Generate Loop Utilities

Overview

Scoring

`get_score`

Signature

Parameters

Return Value

`get_saved_score`

Signature

Parameters

Return Value

Archive Management

`update_and_save_archive`

Signature

Parameters

Return Value

`load_archive_data`

Signature

Parameters

Return Value

Parent Selection

`select_parent`

Signature

Parameters

Return Value

Container Patch Application

`apply_diffs_container`

Signature

Parameters

Return Value

`get_patch_files`

Signature

Parameters

Return Value

Build docs developers (and LLMs) love

Agents

LLM & Tools

Utilities

​Overview

​Scoring

​get_score

​Signature

​Parameters

​Return Value

​get_saved_score

​Signature

​Parameters

​Return Value

​Archive Management

​update_and_save_archive

​Signature

​Parameters

​Return Value

​load_archive_data

​Signature

​Parameters

​Return Value

​Parent Selection

​select_parent

​Signature

​Parameters

​Return Value

​Container Patch Application

​apply_diffs_container

​Signature

​Parameters

​Return Value

​get_patch_files

​Signature

​Parameters

​Return Value

Build docs developers (and LLMs) love

Overview

Scoring

`get_score`

Signature

Parameters

Return Value

`get_saved_score`

Signature

Parameters

Return Value

Archive Management

`update_and_save_archive`

Signature

Parameters

Return Value

`load_archive_data`

Signature

Parameters

Return Value

Parent Selection

`select_parent`

Signature

Parameters

Return Value

Container Patch Application

`apply_diffs_container`

Signature

Parameters

Return Value

`get_patch_files`

Signature

Parameters

Return Value