Quick start
Run a simple experiment with default settings:- Run in solo mode (single agent per task)
- Use the “lite” subset of tasks
- Auto-generate an experiment name like
solo-msa-gemini-3-flash-lite - Save results to
logs/
Basic usage
Choose a setting
CooperBench supports two evaluation settings:Cooperative (coop): Two agents collaborate on implementing two featuresSolo: Single agent implements both features independently
Select tasks to run
Filter tasks using subsets, repositories, or task IDs:
- Subset
- Repository
- Task ID
- Feature pair
Use predefined task collections:Common subsets:
lite- Small subset for quick testingdev- Development subset- Custom subsets in
dataset/subsets/
Command reference
Basic options
Experiment name. Auto-generated if not provided.
Evaluation setting:
coop (collaborative) or solo (independent)Use a predefined task subset from
dataset/subsets/Filter by repository name
Filter by specific task ID
Specific feature pair to run (comma-separated)
Model and agent
LLM model to use. Supports any LiteLLM-compatible model.
Agent framework to use
Path to agent-specific configuration file
Concurrency
Number of tasks to run in parallel
Collaboration features
Enable git-based collaboration (agents can push/pull/merge)
Only available in cooperative mode. Requires git server setup.
Disable inter-agent messaging
Redis URL for inter-agent communication
Backend selection
Evaluation
Disable automatic evaluation after task completionBy default, tasks are evaluated automatically as they complete.
Number of parallel evaluations for auto-eval
Other options
Force rerun even if results already exist
Examples
Single task with detailed output
Run one task to see detailed agent output:Cooperative experiment
Run multiple tasks with two agents collaborating:Solo with git collaboration
Enable git features in solo mode:Specific model and high concurrency
Run with Claude and high parallelism:Filter by repository
Run all tasks from a single repository:Output structure
Results are saved tologs/{experiment-name}/:
Result files
Next steps
Evaluation
Learn how to evaluate your experiment results
Backends
Choose the right execution backend for your needs
Custom agents
Implement your own agent framework
GCP setup
Set up Google Cloud Platform backend