Skip to main content
CooperBench supports three execution backends: Modal (cloud), Docker (local), and GCP (Google Cloud Platform). Each has different tradeoffs for cost, scale, and setup complexity.

Quick comparison

Modal

Best for: Quick starts, medium-scale experiments
  • ✓ Minimal setup
  • ✓ Auto-scaling
  • ✓ No infrastructure management
  • ✗ External service dependency
  • ✗ Limited customization

Docker

Best for: Local testing, debugging
  • ✓ No cloud account needed
  • ✓ Fast iteration
  • ✓ Complete control
  • ✗ Limited parallelism
  • ✗ Resource constrained

GCP

Best for: Large-scale runs, custom requirements
  • ✓ High scalability
  • ✓ Cost control
  • ✓ Full customization
  • ✗ More complex setup
  • ✗ Requires GCP account

Detailed comparison

FeatureModalDockerGCP
Setup time2 minutes5 minutes10 minutes
Account requiredModalNoneGoogle Cloud
Parallelism50-1005-10100+
Cost modelModal creditsFree (local)Pay-as-you-go
CustomizationLimitedFullFull
Data localityExternalLocalYour GCP project
VM controlNoneLocal machineFull control
Modal provides serverless infrastructure for running agent sandboxes.

Setup

1

Install Modal

pip install cooperbench[modal]
2

Authenticate

modal setup
This opens a browser to create/link your Modal account.
3

Run experiments

cooperbench run --backend modal -s lite

When to use Modal

  • Quick experimentation: Get started in minutes
  • Medium-scale runs: 10-100 tasks
  • No infrastructure: Don’t want to manage VMs
  • Auto-scaling: Automatic resource management

Pricing

Modal charges based on:
  • Compute time (per second)
  • Memory usage
  • Network egress
See Modal pricing for current rates.
Modal offers free credits for new users. Perfect for trying CooperBench!

Docker backend

Docker runs agent sandboxes locally on your machine.

Setup

1

Install Docker

Download from docker.comVerify installation:
docker --version
2

Install CooperBench

pip install cooperbench[docker]
3

Run experiments

cooperbench run --backend docker -s lite --concurrency 5
Limit concurrency to 5-10 to avoid overwhelming your machine.

When to use Docker

  • Local testing: Debug single tasks
  • Development: Iterate on agent code
  • No cloud account: Work completely offline
  • Full control: Inspect containers, modify images

Resource requirements

Recommended specs:
  • CPU: 8+ cores (for concurrency 5-10)
  • RAM: 16GB+ (2GB per concurrent task)
  • Disk: 50GB+ (for Docker images)
Example resource usage:
# Single task
1 CPU core, 2GB RAM, ~5-30 minutes

# Concurrency 5
5 CPU cores, 10GB RAM

# Concurrency 10
10 CPU cores, 20GB RAM

Configuration

Docker backend settings in agent config:
backend: docker

# Optional: Docker-specific settings
docker:
  # Custom network for git collaboration
  network: cooperbench-net

  # Resource limits per container
  cpus: 2.0
  memory: 4g

GCP backend

GCP runs agent sandboxes on Google Cloud VMs with Batch API for evaluation.

Setup

See the complete GCP setup guide for detailed instructions.
1

Install dependencies

pip install cooperbench[gcp]
2

Run configuration wizard

cooperbench config gcp
The wizard handles:
  • gcloud authentication
  • Project selection
  • Region/zone setup
  • API validation
3

Run experiments

cooperbench run --backend gcp -s lite
cooperbench eval --backend gcp -n my-experiment --concurrency 100

When to use GCP

  • Large-scale runs: 100-1000+ tasks
  • High parallelism: 50-100+ concurrent tasks
  • Cost control: Use your own GCP credits and quotas
  • Data locality: Keep data in your GCP project
  • Custom VMs: Use custom VM images, networks, etc.

Architecture

Agent execution

┌──────────────────────────────────────┐
│ CooperBench (Local)                  │
│   ↓ SSH via gcloud compute ssh      │
├──────────────────────────────────────┤
│ GCP VM (Container-Optimized OS)      │
│   ├─ Docker Container (Agent)        │
│   │   └─ Agent code execution        │
│   └─ Commands via docker exec        │
└──────────────────────────────────────┘
Each cooperbench run task gets:
  • Dedicated VM (e2-medium: 2 vCPU, 4GB RAM)
  • Container-Optimized OS
  • Docker container with task environment
  • Auto-cleanup after completion

Evaluation (Batch API)

┌────────────────────────────────────────┐
│ CooperBench (Local)                    │
│   ↓ Submit Batch job                   │
├────────────────────────────────────────┤
│ GCP Batch (Managed)                    │
│   ├─ Task 1 (VM 1)                     │
│   ├─ Task 2 (VM 1)                     │
│   ├─ Task 3 (VM 2)                     │
│   └─ Task N (VM M)                     │
│         ↓ Results                      │
├────────────────────────────────────────┤
│ GCS Bucket                             │
│   └─ Results, patches, manifests       │
└────────────────────────────────────────┘
Evaluation uses GCP Batch for efficiency:
  • Single job submission for all tasks
  • Parallel execution across VMs
  • Automatic scheduling and cleanup
  • Results stored in GCS

Cost estimation

Using default settings in us-central1:

Agent execution

VM: e2-medium (2 vCPU, 4GB RAM)
Cost: ~$0.03/hour
Typical task: 5-30 minutes
Per task: $0.0025 - $0.015

Evaluation

VM: 4 vCPU, 16GB RAM per batch worker
Cost: ~$0.15/hour per VM
Parallelism: 50 VMs = ~$7.50/hour
Typical job: 10-30 minutes for 500 tasks

Example costs

  • Small run (10 tasks): 0.050.05 - 0.15
  • Medium run (100 tasks): 0.500.50 - 1.50
  • Large run (1000 tasks): 5.005.00 - 10.00
GCP free tier includes $300 credit for new users.

Configuration

GCP backend settings:
backend: gcp

# GCP-specific configuration
project_id: my-project
zone: us-central1-a
machine_type: e2-medium

# Optional: Custom VM image (for faster startup)
vm_image_family: cooperbench-eval

# Optional: Custom VPC network (for git collaboration)
git_network: cooperbench-vpc

Choosing a backend

Decision flowchart

Starting CooperBench?
  ├─ Yes → Use Modal (fastest setup)
  └─ No ↓

Running < 50 tasks?
  ├─ Yes → Use Modal or Docker
  └─ No ↓

Need high parallelism (50+)?
  ├─ Yes → Use GCP
  └─ No ↓

Need custom infrastructure?
  ├─ Yes → Use GCP or Docker
  └─ No → Use Modal

Use case recommendations

Use Modal
pip install cooperbench[modal]
modal setup
cooperbench run --backend modal -s lite
Setup time: 2 minutes
Use Docker
pip install cooperbench[docker]
cooperbench run --backend docker -r llama_index_task -t 8394 -f 1,2
Best for:
  • Detailed debugging
  • Fast iteration
  • Inspecting agent behavior
Use GCP
pip install cooperbench[gcp]
cooperbench config gcp
cooperbench run --backend gcp -s full --concurrency 50
cooperbench eval --backend gcp -n my-experiment --concurrency 100
Benefits:
  • High parallelism
  • Cost-effective at scale
  • Complete control
Use GCP or DockerDocker for local:
backend: docker
docker:
  network: my-custom-network
  image: custom/agent-env:latest
GCP for cloud:
backend: gcp
vm_image_family: my-custom-image
git_network: my-vpc
Use Docker (free, local)Or GCP (efficient at scale):
# Use spot instances (not yet supported)
# Use custom VM images to reduce startup time
# Use lifecycle policies to auto-delete old data

Switching backends

You can mix backends across experiments:
# Run experiments with Modal
cooperbench run --backend modal -s lite -n experiment-1

# Evaluate with GCP for better parallelism
cooperbench eval --backend gcp -n experiment-1 --concurrency 100
Experiments and evaluations use independent backends. Run with one backend, evaluate with another!

Next steps

GCP setup

Complete guide to configuring GCP backend

Running experiments

Learn how to run experiments with different backends

Evaluation

Understand the evaluation process