Quick comparison
Modal
Best for: Quick starts, medium-scale experiments
- ✓ Minimal setup
- ✓ Auto-scaling
- ✓ No infrastructure management
- ✗ External service dependency
- ✗ Limited customization
Docker
Best for: Local testing, debugging
- ✓ No cloud account needed
- ✓ Fast iteration
- ✓ Complete control
- ✗ Limited parallelism
- ✗ Resource constrained
GCP
Best for: Large-scale runs, custom requirements
- ✓ High scalability
- ✓ Cost control
- ✓ Full customization
- ✗ More complex setup
- ✗ Requires GCP account
Detailed comparison
| Feature | Modal | Docker | GCP |
|---|---|---|---|
| Setup time | 2 minutes | 5 minutes | 10 minutes |
| Account required | Modal | None | Google Cloud |
| Parallelism | 50-100 | 5-10 | 100+ |
| Cost model | Modal credits | Free (local) | Pay-as-you-go |
| Customization | Limited | Full | Full |
| Data locality | External | Local | Your GCP project |
| VM control | None | Local machine | Full control |
Modal backend
Modal provides serverless infrastructure for running agent sandboxes.Setup
When to use Modal
- Good for
- Not ideal for
- Quick experimentation: Get started in minutes
- Medium-scale runs: 10-100 tasks
- No infrastructure: Don’t want to manage VMs
- Auto-scaling: Automatic resource management
Pricing
Modal charges based on:- Compute time (per second)
- Memory usage
- Network egress
Docker backend
Docker runs agent sandboxes locally on your machine.Setup
When to use Docker
- Good for
- Not ideal for
- Local testing: Debug single tasks
- Development: Iterate on agent code
- No cloud account: Work completely offline
- Full control: Inspect containers, modify images
Resource requirements
Recommended specs:- CPU: 8+ cores (for concurrency 5-10)
- RAM: 16GB+ (2GB per concurrent task)
- Disk: 50GB+ (for Docker images)
Configuration
Docker backend settings in agent config:GCP backend
GCP runs agent sandboxes on Google Cloud VMs with Batch API for evaluation.Setup
See the complete GCP setup guide for detailed instructions.Run configuration wizard
- gcloud authentication
- Project selection
- Region/zone setup
- API validation
When to use GCP
- Good for
- Not ideal for
- Large-scale runs: 100-1000+ tasks
- High parallelism: 50-100+ concurrent tasks
- Cost control: Use your own GCP credits and quotas
- Data locality: Keep data in your GCP project
- Custom VMs: Use custom VM images, networks, etc.
Architecture
Agent execution
cooperbench run task gets:
- Dedicated VM (e2-medium: 2 vCPU, 4GB RAM)
- Container-Optimized OS
- Docker container with task environment
- Auto-cleanup after completion
Evaluation (Batch API)
- Single job submission for all tasks
- Parallel execution across VMs
- Automatic scheduling and cleanup
- Results stored in GCS
Cost estimation
Using default settings in us-central1:Agent execution
Evaluation
Example costs
- Small run (10 tasks): 0.15
- Medium run (100 tasks): 1.50
- Large run (1000 tasks): 10.00
Configuration
GCP backend settings:Choosing a backend
Decision flowchart
Use case recommendations
I want to try CooperBench quickly
I want to try CooperBench quickly
Use ModalSetup time: 2 minutes
I want to debug a single task locally
I want to debug a single task locally
Use DockerBest for:
- Detailed debugging
- Fast iteration
- Inspecting agent behavior
I want to evaluate 1000+ tasks
I want to evaluate 1000+ tasks
Use GCPBenefits:
- High parallelism
- Cost-effective at scale
- Complete control
I need custom Docker images or networks
I need custom Docker images or networks
Use GCP or DockerDocker for local:GCP for cloud:
I want to minimize costs
I want to minimize costs
Use Docker (free, local)Or GCP (efficient at scale):
Switching backends
You can mix backends across experiments:Next steps
GCP setup
Complete guide to configuring GCP backend
Running experiments
Learn how to run experiments with different backends
Evaluation
Understand the evaluation process