TaskAgent is the workhorse that actually solves problems. It is the target of every meta-agent improvement cycle: each generation’s model_patch.diff may change task_agent.py in any way — new tools, different prompts, chain-of-thought strategies, or entirely new architectures. Its performance on the evaluation dataset is the fitness signal that drives evolution.
Class Definition
task_agent.py
What forward() Does
forward() takes a single inputs dict, builds an instruction string, calls the LLM, and extracts a JSON response.
| Parameter | Type | Description |
|---|---|---|
inputs | dict | Domain-specific task data; must contain at least "domain" key |
| Return value | Type | Description |
|---|---|---|
prediction | str | The extracted response value from the JSON output, or "None" on failure |
new_msg_history | list | Full LLM message history for the interaction (useful for debugging) |
TaskAgent calls chat_with_agent without tools (tools_available=[] default), so the LLM reasons from its context window alone. The meta-agent can upgrade this to add tools, multi-turn reasoning, or external API calls.
JSON Response Schema
The base prompt instructs the LLM to respond inside a<json> block:
forward() uses extract_jsons from utils/common.py, which scans new_msg_history[-1]['text'] for <json>...</json> blocks and parses them. It takes the last valid JSON object that contains a "response" key. If parsing fails for any reason, prediction falls back to the string "None" and the error is logged — the harness continues running other samples.
Domain Input Dict Examples
The exact shape ofinputs is determined by each domain’s format_input_dict() function in domains/<domain>/utils.py. The dict always contains at least "domain". Typical examples:
- search_arena
- paper_review
- imo_grading
The
domain key is read at the top of forward() but not actually used in the base implementation’s instruction string — the entire inputs dict is embedded verbatim. Meta-agent improvements often use domain to branch into domain-specific prompting strategies.How the Harness Loads and Runs TaskAgent
The evaluation harness (domains/harness.py) dynamically imports TaskAgent and runs it in parallel across the dataset:
Dynamic import
load_task_agent(agent_path) loads the agent from a file path (e.g. ./task_agent.py) or a module string:domains/harness.py (load_task_agent)
task_agent.py — exactly what the meta-agent patched.Per-sample worker
Each dataset row is dispatched to a A fresh
ThreadPoolExecutor worker:domains/harness.py (run_agent)
TaskAgent instance is created for every sample (thread-safe by design).How the Meta-Agent Improves TaskAgent
The meta-agent receivesrepo_path pointing at the live repo inside the Docker container. Using the bash and editor tools, it can make any change to task_agent.py. Common improvement patterns include:
Adding tools to the task-agent
Adding tools to the task-agent
The baseline
TaskAgent calls chat_with_agent with no tools. A meta-agent improvement might pass tools_available='all' or a specific list, enabling the task-agent to run bash commands or search the web during inference:Changing the prompt strategy
Changing the prompt strategy
The meta-agent can rewrite the
instruction string to use chain-of-thought, few-shot examples, domain-specific personas, or structured reasoning steps tailored to the domain type.Changing the model
Changing the model
AgentSystem.__init__ accepts model from the harness utility module (domains/<domain>/utils.py). A meta-agent can change which model is used per domain or add model routing based on task difficulty.Adding helper functions or modules
Adding helper functions or modules
The meta-agent can create entirely new Python files alongside
task_agent.py and import them. For example, adding a retrieval module, a symbolic reasoning step, or a custom JSON parser.Compilation Verification
Before evaluation runs, the loop verifies that the patchedtask_agent.py is importable:
utils/gl_utils.py
generate_loop.py marks run_eval=False in the metadata and skips evaluation for that generation. The generation still gets added to the archive but is not a valid parent for future generations.
Meta-Agent
The agent that modifies task_agent.py between generations.
Evolution Loop
How generations are orchestrated and scored.