cooperbench run

The cooperbench run command executes agents on benchmark tasks. It supports both cooperative (multi-agent) and solo (single-agent) settings.

Usage

cooperbench run [options]

Basic examples

Run with auto-generated name

cooperbench run --setting solo -s lite

Auto-generates name like: solo-msa-gemini-3-flash-lite

Run with custom name

cooperbench run -n my-experiment --setting solo -r llama_index_task

Runs experiment named my-experiment on llama_index repository tasks.

Multi-agent cooperative setting

cooperbench run -n team-experiment --setting coop -s lite --git

Runs two agents cooperatively with git collaboration enabled.

Parameters

Experiment identification

-n, --name

string

Experiment name. If not provided, auto-generated from settings.Format: {setting}-{agent}-{model}-{subset}-{repo}-{task}Examples: solo-msa-gemini-3-flash, coop-sw-git-gpt-4o-lite

Task filtering

-s, --subset

string

Use a predefined task subset. See dataset/subsets/ for available subsets.Common values: lite, full

-r, --repo

string

Filter by repository name.Example: llama_index_task, dspy_task

-t, --task

integer

Filter by specific task ID.Example: 8394, 1234

-f, --features

string

Specific feature pair to run, comma-separated.Example: 1,2 (runs features 1 and 2)

Model and agent

-m, --model

string

default:"vertex_ai/gemini-3-flash-preview"

LLM model to use. Supports any LiteLLM-compatible model string.Examples:

vertex_ai/gemini-3-flash-preview
gpt-4o
claude-3-5-sonnet-20241022
anthropic/claude-opus-4-20250514

-a, --agent

string

default:"mini_swe_agent"

Agent framework to use.Options:

mini_swe_agent - Lightweight SWE-agent (default)
swe_agent - Full SWE-agent
openhands - OpenHands agent
Custom agents via COOPERBENCH_EXTERNAL_AGENTS environment variable

--agent-config

string

Path to agent-specific configuration file. Format depends on the agent.Example: --agent-config configs/custom.yamlSee Agent configuration for details.

Execution settings

--setting

choice

default:"coop"

Benchmark setting.Options:

coop - Two agents working cooperatively (default)
solo - Single agent working alone

-c, --concurrency

integer

default:"30"

Number of parallel tasks to run.Default: 30

--backend

choice

default:"modal"

Execution backend.Options:

modal - Modal cloud platform (default)
docker - Local Docker containers
gcp - Google Cloud Platform VMs

See Backends for configuration details.

Collaboration features

--redis

string

default:"redis://localhost:6379"

Redis URL for inter-agent communication in cooperative mode.Default: redis://localhost:6379

--git

flag

Enable git collaboration. Agents can push/pull/merge via shared remote.When enabled, each agent gets their own branch and can share code via a “team” remote.

--no-messaging

flag

Disable the send_message command for agent communication.

Evaluation

--no-auto-eval

flag

Disable automatic evaluation after task completion.By default, evaluation runs automatically after all tasks finish.

--eval-concurrency

integer

default:"10"

Number of parallel evaluations for auto-eval.Default: 10

Other options

--force

flag

Force rerun even if results already exist in logs/ directory.

Filtering examples

Run specific subset

cooperbench run -s lite

Run specific repository

cooperbench run -r llama_index_task

Run specific task

cooperbench run -t 8394

Run specific feature pair

cooperbench run -t 8394 -f 1,2

Runs task 8394, features 1 and 2 only.

Combine filters

cooperbench run -s lite -r dspy_task --setting solo

Runs lite subset, dspy repository only, solo mode.

Backend examples

cooperbench run --backend modal

Default cloud execution on Modal.

Run locally with Docker

cooperbench run --backend docker

Runs tasks in local Docker containers.

Run on GCP

cooperbench run --backend gcp

Requires prior configuration with cooperbench config gcp.

Model examples

OpenAI models

cooperbench run -m gpt-4o
cooperbench run -m gpt-4o-mini

Requires OPENAI_API_KEY environment variable.

Anthropic models

cooperbench run -m claude-3-5-sonnet-20241022
cooperbench run -m anthropic/claude-opus-4-20250514

Requires ANTHROPIC_API_KEY environment variable.

Google models

cooperbench run -m vertex_ai/gemini-3-flash-preview
cooperbench run -m gemini/gemini-pro

Requires GEMINI_API_KEY or GCP authentication for Vertex AI.

Collaborative settings

Git collaboration

cooperbench run --setting coop --git

Enables shared git remote. Agents can:

Push to team remote: git push team agent_0
Fetch teammates’ work: git fetch team
View changes: git log team/agent_1

Messaging only

cooperbench run --setting coop

Agents use send_message command to coordinate.

No collaboration

cooperbench run --setting coop --no-messaging

Agents work independently, merge at the end.

Output

Results are saved to:

logs/{experiment_name}/
  task_{task_id}_feature_{f1}_{f2}/
    agent_0/
      trajectory.jsonl
      patch.diff
    agent_1/
      trajectory.jsonl  
      patch.diff
    eval.json

Auto-generated names

When --name is omitted, names follow this pattern:

{setting}-{agent_short}-{git?}-{model}-{subset?}-{repo?}-{task?}

Examples:

solo-msa-gemini-3-flash - Solo, mini_swe_agent, Gemini 3 Flash
solo-msa-gemini-3-flash-lite - Same but lite subset
coop-sw-git-gpt-4o-dspy-8394 - Coop, swe_agent, git enabled, GPT-4o, dspy repo, task 8394

Agent shorthands:

msa = mini_swe_agent
sw = swe_agent
oh = openhands

See cooperbench/agents/__init__.py for full list.

Commands

Configuration

Usage

Basic examples

Run with auto-generated name

Run with custom name

Multi-agent cooperative setting

Parameters

Experiment identification

Task filtering

Model and agent

Execution settings

Collaboration features

Evaluation

Other options

Filtering examples

Run specific subset

Run specific repository

Run specific task

Run specific feature pair

Combine filters

Backend examples

Run locally with Docker

Run on GCP

Model examples

OpenAI models

Anthropic models

Google models

Collaborative settings

Git collaboration

Messaging only

No collaboration

Output

Auto-generated names

Commands

Configuration

​Usage

​Basic examples

​Run with auto-generated name

​Run with custom name

​Multi-agent cooperative setting

​Parameters

​Experiment identification

​Task filtering

​Model and agent

​Execution settings

​Collaboration features

​Evaluation

​Other options

​Filtering examples

​Run specific subset

​Run specific repository

​Run specific task

​Run specific feature pair

​Combine filters

​Backend examples

​Run on Modal (cloud)

​Run locally with Docker

​Run on GCP

​Model examples

​OpenAI models

​Anthropic models

​Google models

​Collaborative settings

​Git collaboration

​Messaging only

​No collaboration

​Output

​Auto-generated names

Usage

Basic examples

Run with auto-generated name

Run with custom name

Multi-agent cooperative setting

Parameters

Experiment identification

Task filtering

Model and agent

Execution settings

Collaboration features

Evaluation

Other options

Filtering examples

Run specific subset

Run specific repository

Run specific task

Run specific feature pair

Combine filters

Backend examples

Run on Modal (cloud)

Run locally with Docker

Run on GCP

Model examples

OpenAI models

Anthropic models

Google models

Collaborative settings

Git collaboration

Messaging only

No collaboration

Output

Auto-generated names