Skip to main content
The cooperbench run command executes agents on benchmark tasks. It supports both cooperative (multi-agent) and solo (single-agent) settings.

Usage

cooperbench run [options]

Basic examples

Run with auto-generated name

cooperbench run --setting solo -s lite
Auto-generates name like: solo-msa-gemini-3-flash-lite

Run with custom name

cooperbench run -n my-experiment --setting solo -r llama_index_task
Runs experiment named my-experiment on llama_index repository tasks.

Multi-agent cooperative setting

cooperbench run -n team-experiment --setting coop -s lite --git
Runs two agents cooperatively with git collaboration enabled.

Parameters

Experiment identification

-n, --name
string
Experiment name. If not provided, auto-generated from settings.Format: {setting}-{agent}-{model}-{subset}-{repo}-{task}Examples: solo-msa-gemini-3-flash, coop-sw-git-gpt-4o-lite

Task filtering

-s, --subset
string
Use a predefined task subset. See dataset/subsets/ for available subsets.Common values: lite, full
-r, --repo
string
Filter by repository name.Example: llama_index_task, dspy_task
-t, --task
integer
Filter by specific task ID.Example: 8394, 1234
-f, --features
string
Specific feature pair to run, comma-separated.Example: 1,2 (runs features 1 and 2)

Model and agent

-m, --model
string
default:"vertex_ai/gemini-3-flash-preview"
LLM model to use. Supports any LiteLLM-compatible model string.Examples:
  • vertex_ai/gemini-3-flash-preview
  • gpt-4o
  • claude-3-5-sonnet-20241022
  • anthropic/claude-opus-4-20250514
-a, --agent
string
default:"mini_swe_agent"
Agent framework to use.Options:
  • mini_swe_agent - Lightweight SWE-agent (default)
  • swe_agent - Full SWE-agent
  • openhands - OpenHands agent
  • Custom agents via COOPERBENCH_EXTERNAL_AGENTS environment variable
--agent-config
string
Path to agent-specific configuration file. Format depends on the agent.Example: --agent-config configs/custom.yamlSee Agent configuration for details.

Execution settings

--setting
choice
default:"coop"
Benchmark setting.Options:
  • coop - Two agents working cooperatively (default)
  • solo - Single agent working alone
-c, --concurrency
integer
default:"30"
Number of parallel tasks to run.Default: 30
--backend
choice
default:"modal"
Execution backend.Options:
  • modal - Modal cloud platform (default)
  • docker - Local Docker containers
  • gcp - Google Cloud Platform VMs
See Backends for configuration details.

Collaboration features

--redis
string
default:"redis://localhost:6379"
Redis URL for inter-agent communication in cooperative mode.Default: redis://localhost:6379
--git
flag
Enable git collaboration. Agents can push/pull/merge via shared remote.When enabled, each agent gets their own branch and can share code via a “team” remote.
--no-messaging
flag
Disable the send_message command for agent communication.

Evaluation

--no-auto-eval
flag
Disable automatic evaluation after task completion.By default, evaluation runs automatically after all tasks finish.
--eval-concurrency
integer
default:"10"
Number of parallel evaluations for auto-eval.Default: 10

Other options

--force
flag
Force rerun even if results already exist in logs/ directory.

Filtering examples

Run specific subset

cooperbench run -s lite

Run specific repository

cooperbench run -r llama_index_task

Run specific task

cooperbench run -t 8394

Run specific feature pair

cooperbench run -t 8394 -f 1,2
Runs task 8394, features 1 and 2 only.

Combine filters

cooperbench run -s lite -r dspy_task --setting solo
Runs lite subset, dspy repository only, solo mode.

Backend examples

Run on Modal (cloud)

cooperbench run --backend modal
Default cloud execution on Modal.

Run locally with Docker

cooperbench run --backend docker
Runs tasks in local Docker containers.

Run on GCP

cooperbench run --backend gcp
Requires prior configuration with cooperbench config gcp.

Model examples

OpenAI models

cooperbench run -m gpt-4o
cooperbench run -m gpt-4o-mini
Requires OPENAI_API_KEY environment variable.

Anthropic models

cooperbench run -m claude-3-5-sonnet-20241022
cooperbench run -m anthropic/claude-opus-4-20250514
Requires ANTHROPIC_API_KEY environment variable.

Google models

cooperbench run -m vertex_ai/gemini-3-flash-preview
cooperbench run -m gemini/gemini-pro
Requires GEMINI_API_KEY or GCP authentication for Vertex AI.

Collaborative settings

Git collaboration

cooperbench run --setting coop --git
Enables shared git remote. Agents can:
  • Push to team remote: git push team agent_0
  • Fetch teammates’ work: git fetch team
  • View changes: git log team/agent_1

Messaging only

cooperbench run --setting coop
Agents use send_message command to coordinate.

No collaboration

cooperbench run --setting coop --no-messaging
Agents work independently, merge at the end.

Output

Results are saved to:
logs/{experiment_name}/
  task_{task_id}_feature_{f1}_{f2}/
    agent_0/
      trajectory.jsonl
      patch.diff
    agent_1/
      trajectory.jsonl  
      patch.diff
    eval.json

Auto-generated names

When --name is omitted, names follow this pattern:
{setting}-{agent_short}-{git?}-{model}-{subset?}-{repo?}-{task?}
Examples:
  • solo-msa-gemini-3-flash - Solo, mini_swe_agent, Gemini 3 Flash
  • solo-msa-gemini-3-flash-lite - Same but lite subset
  • coop-sw-git-gpt-4o-dspy-8394 - Coop, swe_agent, git enabled, GPT-4o, dspy repo, task 8394
Agent shorthands:
  • msa = mini_swe_agent
  • sw = swe_agent
  • oh = openhands
See cooperbench/agents/__init__.py for full list.