Function signature
Parameters
Experiment name used for organizing logs. Creates a directory at
logs/{run_name}/.Use a predefined task subset (e.g.,
"lite" for quick testing). Subsets are defined in dataset/subsets/.Filter tasks by repository name (e.g.,
"llama_index_task"). Runs only tasks from this repository.Filter to a specific task ID. Useful for debugging individual tasks.
Specific feature pair to run (e.g.,
[1, 2]). If not specified, runs all feature combinations.LLM model identifier. Supports OpenAI models (e.g.,
"gpt-4o"), Anthropic models (e.g., "claude-3-5-sonnet-20241022"), and Vertex AI models (e.g., "vertex_ai/gemini-3-flash-preview").Agent framework to use. Available agents:
"mini_swe_agent", "swe_agent", "openhands", "mini_swe_agent_v2".Maximum number of tasks to run in parallel.
If
True, reruns tasks even if results already exist.Redis server URL for agent communication in cooperative mode. Required when
messaging_enabled=True.Execution mode:
"coop": Two agents collaborate on different features"solo": Single agent implements both features
Enable git collaboration features (push, pull, merge). Only applies to cooperative mode.
Enable agent-to-agent messaging via the
send_message command. Requires Redis in cooperative mode.Automatically evaluate runs after completion. Results are saved to
eval.json.Maximum number of parallel evaluations when
auto_eval=True.Execution backend:
"modal", "docker", or "gcp".Path to agent-specific configuration file (optional).
Basic usage
Run a single task
Run all tasks in a subset
Run in solo mode
Advanced usage
Custom agent and backend
Enable git collaboration
Force rerun with custom concurrency
Skip automatic evaluation
Output structure
Results are saved tologs/{run_name}/{setting}/{repo}/{task_id}/{features}/:
Return value
This function returnsNone. Results are saved to the logs directory and printed to the console.
Related functions
- evaluate() - Evaluate completed runs
- discover_tasks() - Query available tasks