Overview
TaskAgent (defined in task_agent.py) is the per-task solver agent. Given a dictionary of domain-specific inputs, it constructs a natural-language instruction, calls the LLM via chat_with_agent, and extracts the response field from a JSON reply. The harness runs many TaskAgent instances concurrently via ThreadPoolExecutor, one per dataset row.
TaskAgent extends AgentSystem and adds no new constructor parameters.
Source
task_agent.py
Constructor
TaskAgent inherits the constructor from AgentSystem unchanged.
TaskAgent per question, using a per-question chat history path:
AgentSystem for full parameter details.
forward
inputs dict into an instruction, calls the LLM without tools (the default tools_available=[]), then parses the last assistant message for a JSON object containing a "response" key.
Parameters
A dictionary of task inputs. Must contain at minimum:Example — The exact keys beyond
paper_review domain:domain are determined by the domain’s format_input_dict function (e.g. domains/paper_review/utils.py).Return Value
Returns a 2-tuple.The value of
response extracted from the last LLM message’s JSON block.
Returns the string "None" (not Python None) in two cases:- The LLM’s final message contains no parseable JSON.
- The parsed JSON does not contain a
"response"key. - Any exception is raised during extraction.
The full conversation history as a list of message dicts, each with
"role"
("user" or "assistant") and "text" keys. The last element is always
the final assistant response. Useful for debugging or post-hoc analysis of
the agent’s reasoning.JSON Response Schema
The agent is instructed to reply using the following schema, wrapped in<json> tags so the parser can find it:
extract_jsons scans new_msg_history[-1]['text'] for all <json>...</json> blocks and returns the last one whose value can be parsed. The response field value is returned verbatim — it can be a string, number, list, or nested object depending on the domain.
TaskAgent.forward calls chat_with_agent with the default
tools_available=[], meaning no tools are loaded and the agent must reason
entirely from the information in inputs. This is intentional: task agents
are stateless solvers, not autonomous actors. Use MetaAgent
when you need tool access.How the Harness Loads TaskAgent
domains/harness.py dynamically loads TaskAgent from a file path or importable module path at runtime using importlib:
domains/harness.py
--agent_path ./task_agent.py (a file) or --agent_path mypackage.task_agent (a module), and the harness will find the TaskAgent class either way.
Parallel Execution
The harness runs oneTaskAgent.forward call per dataset row, batched across a ThreadPoolExecutor:
domains/harness.py
run_agent, which constructs a fresh TaskAgent instance with a unique chat_history_file path (chat_history_{question_id}.md). Because ThreadLoggerManager keys loggers by (thread_id, log_file), concurrent agents write to separate files without any locking overhead beyond the initial logger creation.
The default num_workers is 5. Override with --num_workers on the CLI.