cooperbench run command executes agents on benchmark tasks. It supports both cooperative (multi-agent) and solo (single-agent) settings.
Usage
Basic examples
Run with auto-generated name
solo-msa-gemini-3-flash-lite
Run with custom name
my-experiment on llama_index repository tasks.
Multi-agent cooperative setting
Parameters
Experiment identification
Experiment name. If not provided, auto-generated from settings.Format:
{setting}-{agent}-{model}-{subset}-{repo}-{task}Examples: solo-msa-gemini-3-flash, coop-sw-git-gpt-4o-liteTask filtering
Use a predefined task subset. See
dataset/subsets/ for available subsets.Common values: lite, fullFilter by repository name.Example:
llama_index_task, dspy_taskFilter by specific task ID.Example:
8394, 1234Specific feature pair to run, comma-separated.Example:
1,2 (runs features 1 and 2)Model and agent
LLM model to use. Supports any LiteLLM-compatible model string.Examples:
vertex_ai/gemini-3-flash-previewgpt-4oclaude-3-5-sonnet-20241022anthropic/claude-opus-4-20250514
Agent framework to use.Options:
mini_swe_agent- Lightweight SWE-agent (default)swe_agent- Full SWE-agentopenhands- OpenHands agent- Custom agents via
COOPERBENCH_EXTERNAL_AGENTSenvironment variable
Path to agent-specific configuration file. Format depends on the agent.Example:
--agent-config configs/custom.yamlSee Agent configuration for details.Execution settings
Benchmark setting.Options:
coop- Two agents working cooperatively (default)solo- Single agent working alone
Number of parallel tasks to run.Default: 30
Execution backend.Options:
modal- Modal cloud platform (default)docker- Local Docker containersgcp- Google Cloud Platform VMs
Collaboration features
Redis URL for inter-agent communication in cooperative mode.Default:
redis://localhost:6379Enable git collaboration. Agents can push/pull/merge via shared remote.When enabled, each agent gets their own branch and can share code via a “team” remote.
Disable the
send_message command for agent communication.Evaluation
Disable automatic evaluation after task completion.By default, evaluation runs automatically after all tasks finish.
Number of parallel evaluations for auto-eval.Default: 10
Other options
Force rerun even if results already exist in
logs/ directory.Filtering examples
Run specific subset
Run specific repository
Run specific task
Run specific feature pair
Combine filters
Backend examples
Run on Modal (cloud)
Run locally with Docker
Run on GCP
cooperbench config gcp.
Model examples
OpenAI models
OPENAI_API_KEY environment variable.
Anthropic models
ANTHROPIC_API_KEY environment variable.
Google models
GEMINI_API_KEY or GCP authentication for Vertex AI.
Collaborative settings
Git collaboration
- Push to
teamremote:git push team agent_0 - Fetch teammates’ work:
git fetch team - View changes:
git log team/agent_1
Messaging only
send_message command to coordinate.
No collaboration
Output
Results are saved to:Auto-generated names
When--name is omitted, names follow this pattern:
solo-msa-gemini-3-flash- Solo, mini_swe_agent, Gemini 3 Flashsolo-msa-gemini-3-flash-lite- Same but lite subsetcoop-sw-git-gpt-4o-dspy-8394- Coop, swe_agent, git enabled, GPT-4o, dspy repo, task 8394
msa= mini_swe_agentsw= swe_agentoh= openhands
cooperbench/agents/__init__.py for full list.