Backends

CooperBench supports three execution backends: Modal (cloud), Docker (local), and GCP (Google Cloud Platform). Each has different tradeoffs for cost, scale, and setup complexity.

Quick comparison

Modal

Best for: Quick starts, medium-scale experiments

✓ Minimal setup
✓ Auto-scaling
✓ No infrastructure management
✗ External service dependency
✗ Limited customization

Docker

Best for: Local testing, debugging

✓ No cloud account needed
✓ Fast iteration
✓ Complete control
✗ Limited parallelism
✗ Resource constrained

GCP

Best for: Large-scale runs, custom requirements

✓ High scalability
✓ Cost control
✓ Full customization
✗ More complex setup
✗ Requires GCP account

Detailed comparison

Feature	Modal	Docker	GCP
Setup time	2 minutes	5 minutes	10 minutes
Account required	Modal	None	Google Cloud
Parallelism	50-100	5-10	100+
Cost model	Modal credits	Free (local)	Pay-as-you-go
Customization	Limited	Full	Full
Data locality	External	Local	Your GCP project
VM control	None	Local machine	Full control

Modal provides serverless infrastructure for running agent sandboxes.

Setup

Install Modal

pip install cooperbench[modal]

Authenticate

modal setup

This opens a browser to create/link your Modal account.

Run experiments

cooperbench run --backend modal -s lite

Good for
Not ideal for

Quick experimentation: Get started in minutes
Medium-scale runs: 10-100 tasks
No infrastructure: Don’t want to manage VMs
Auto-scaling: Automatic resource management

Pricing

Modal charges based on:

Compute time (per second)
Memory usage
Network egress

See Modal pricing for current rates.

Modal offers free credits for new users. Perfect for trying CooperBench!

Docker backend

Docker runs agent sandboxes locally on your machine.

Setup

Install Docker

Download from docker.comVerify installation:

docker --version

Install CooperBench

pip install cooperbench[docker]

Run experiments

cooperbench run --backend docker -s lite --concurrency 5

Limit concurrency to 5-10 to avoid overwhelming your machine.

When to use Docker

Good for
Not ideal for

Local testing: Debug single tasks
Development: Iterate on agent code
No cloud account: Work completely offline
Full control: Inspect containers, modify images

Resource requirements

Recommended specs:

CPU: 8+ cores (for concurrency 5-10)
RAM: 16GB+ (2GB per concurrent task)
Disk: 50GB+ (for Docker images)

Example resource usage:

# Single task
1 CPU core, 2GB RAM, ~5-30 minutes

# Concurrency 5
5 CPU cores, 10GB RAM

# Concurrency 10
10 CPU cores, 20GB RAM

Configuration

Docker backend settings in agent config:

backend: docker

# Optional: Docker-specific settings
docker:
  # Custom network for git collaboration
  network: cooperbench-net

  # Resource limits per container
  cpus: 2.0
  memory: 4g

GCP backend

GCP runs agent sandboxes on Google Cloud VMs with Batch API for evaluation.

Setup

See the complete GCP setup guide for detailed instructions.

Install dependencies

pip install cooperbench[gcp]

Run configuration wizard

cooperbench config gcp

The wizard handles:

gcloud authentication
Project selection
Region/zone setup
API validation

Run experiments

cooperbench run --backend gcp -s lite
cooperbench eval --backend gcp -n my-experiment --concurrency 100

When to use GCP

Good for
Not ideal for

Large-scale runs: 100-1000+ tasks
High parallelism: 50-100+ concurrent tasks
Cost control: Use your own GCP credits and quotas
Data locality: Keep data in your GCP project
Custom VMs: Use custom VM images, networks, etc.

Architecture

Agent execution

┌──────────────────────────────────────┐
│ CooperBench (Local)                  │
│   ↓ SSH via gcloud compute ssh      │
├──────────────────────────────────────┤
│ GCP VM (Container-Optimized OS)      │
│   ├─ Docker Container (Agent)        │
│   │   └─ Agent code execution        │
│   └─ Commands via docker exec        │
└──────────────────────────────────────┘

Each cooperbench run task gets:

Dedicated VM (e2-medium: 2 vCPU, 4GB RAM)
Container-Optimized OS
Docker container with task environment
Auto-cleanup after completion

Evaluation (Batch API)

┌────────────────────────────────────────┐
│ CooperBench (Local)                    │
│   ↓ Submit Batch job                   │
├────────────────────────────────────────┤
│ GCP Batch (Managed)                    │
│   ├─ Task 1 (VM 1)                     │
│   ├─ Task 2 (VM 1)                     │
│   ├─ Task 3 (VM 2)                     │
│   └─ Task N (VM M)                     │
│         ↓ Results                      │
├────────────────────────────────────────┤
│ GCS Bucket                             │
│   └─ Results, patches, manifests       │
└────────────────────────────────────────┘

Evaluation uses GCP Batch for efficiency:

Single job submission for all tasks
Parallel execution across VMs
Automatic scheduling and cleanup
Results stored in GCS

Cost estimation

Using default settings in us-central1:

Agent execution

VM: e2-medium (2 vCPU, 4GB RAM)
Cost: ~$0.03/hour
Typical task: 5-30 minutes
Per task: $0.0025 - $0.015

Evaluation

VM: 4 vCPU, 16GB RAM per batch worker
Cost: ~$0.15/hour per VM
Parallelism: 50 VMs = ~$7.50/hour
Typical job: 10-30 minutes for 500 tasks

Example costs

Small run (10 tasks): $0.05 -$ 0.15
Medium run (100 tasks): $0.50 -$ 1.50
Large run (1000 tasks): $5.00 -$ 10.00

GCP free tier includes $300 credit for new users.

Configuration

GCP backend settings:

backend: gcp

# GCP-specific configuration
project_id: my-project
zone: us-central1-a
machine_type: e2-medium

# Optional: Custom VM image (for faster startup)
vm_image_family: cooperbench-eval

# Optional: Custom VPC network (for git collaboration)
git_network: cooperbench-vpc

Choosing a backend

Decision flowchart

Starting CooperBench?
  ├─ Yes → Use Modal (fastest setup)
  └─ No ↓

Running < 50 tasks?
  ├─ Yes → Use Modal or Docker
  └─ No ↓

Need high parallelism (50+)?
  ├─ Yes → Use GCP
  └─ No ↓

Need custom infrastructure?
  ├─ Yes → Use GCP or Docker
  └─ No → Use Modal

Use case recommendations

I want to try CooperBench quickly

Use Modal

pip install cooperbench[modal]
modal setup
cooperbench run --backend modal -s lite

Setup time: 2 minutes

I want to debug a single task locally

Use Docker

pip install cooperbench[docker]
cooperbench run --backend docker -r llama_index_task -t 8394 -f 1,2

Best for:

Detailed debugging
Fast iteration
Inspecting agent behavior

I want to evaluate 1000+ tasks

Use GCP

pip install cooperbench[gcp]
cooperbench config gcp
cooperbench run --backend gcp -s full --concurrency 50
cooperbench eval --backend gcp -n my-experiment --concurrency 100

Benefits:

High parallelism
Cost-effective at scale
Complete control

I need custom Docker images or networks

Use GCP or DockerDocker for local:

backend: docker
docker:
  network: my-custom-network
  image: custom/agent-env:latest

GCP for cloud:

backend: gcp
vm_image_family: my-custom-image
git_network: my-vpc

I want to minimize costs

Use Docker (free, local)Or GCP (efficient at scale):

# Use spot instances (not yet supported)
# Use custom VM images to reduce startup time
# Use lifecycle policies to auto-delete old data

Switching backends

You can mix backends across experiments:

# Run experiments with Modal
cooperbench run --backend modal -s lite -n experiment-1

# Evaluate with GCP for better parallelism
cooperbench eval --backend gcp -n experiment-1 --concurrency 100

Experiments and evaluations use independent backends. Run with one backend, evaluate with another!

Next steps

GCP setup

Complete guide to configuring GCP backend

Running experiments

Learn how to run experiments with different backends

Evaluation

Understand the evaluation process

Get Started

Core Concepts

Guides

Results & Analysis

Quick comparison

Modal

Docker

GCP

Detailed comparison

Setup

Pricing

Docker backend

Setup

When to use Docker

Resource requirements

Configuration

GCP backend

Setup

When to use GCP

Architecture

Agent execution

Evaluation (Batch API)

Cost estimation

Agent execution

Evaluation

Example costs

Configuration

Choosing a backend

Decision flowchart

Use case recommendations

Switching backends

Next steps

GCP setup

Running experiments

Evaluation

Get Started

Core Concepts

Guides

Results & Analysis

​Quick comparison

Modal

Docker

GCP

​Detailed comparison

​Modal backend

​Setup

​When to use Modal

​Pricing

​Docker backend

​Setup

​When to use Docker

​Resource requirements

​Configuration

​GCP backend

​Setup

​When to use GCP

​Architecture

​Agent execution

​Evaluation (Batch API)

​Cost estimation

​Agent execution

​Evaluation

​Example costs

​Configuration

​Choosing a backend

​Decision flowchart

​Use case recommendations

​Switching backends

​Next steps

GCP setup

Running experiments

Evaluation

Quick comparison

Detailed comparison

Modal backend

Setup

When to use Modal

Pricing

Docker backend

Setup

When to use Docker

Resource requirements

Configuration

GCP backend

Setup

When to use GCP

Architecture

Agent execution

Evaluation (Batch API)

Cost estimation

Agent execution

Evaluation

Example costs

Configuration

Choosing a backend

Decision flowchart

Use case recommendations

Switching backends

Next steps