Overview
To add a custom agent:- Implement the
AgentRunnerinterface - Define how your agent executes tasks - Register your agent - Use the
@registerdecorator to make it available - Run experiments - Use
--agent your-agent-nameto run benchmarks
Agent interface
Custom agents must implement theAgentRunner protocol:
AgentResult
Your agent must return anAgentResult object:
Implementing a custom agent
Basic example
Here’s a minimal custom agent:Complete example (mini-swe-agent)
Here’s how the built-inmini_swe_agent is implemented:
Registering your agent
Using the decorator
The simplest way is to use the@register decorator:
External registration
For agents in separate packages, use theCOOPERBENCH_EXTERNAL_AGENTS environment variable:
Multiple agents
Register multiple agents by separating module paths with commas:Running your agent
Once registered, use the--agent flag:
Agent configuration
Config file
Provide agent-specific configuration via--agent-config:
Config dictionary
Or pass config directly (for programmatic use):Collaboration features
Inter-agent messaging
In cooperative mode, agents can send messages via Redis:Git collaboration
Agents can share code via git:Environment backends
Choose the execution environment for your agent:Modal (cloud)
Docker (local)
GCP (Google Cloud)
Best practices
Track LLM costs accurately
Track LLM costs accurately
Return accurate cost tracking in This enables accurate cost reporting in benchmark results.
AgentResult.cost:Save conversation history
Save conversation history
Store the full agent conversation in This enables debugging and analysis of agent behavior.
AgentResult.messages:Generate clean git patches
Generate clean git patches
Ensure patches only contain meaningful changes:
Handle errors gracefully
Handle errors gracefully
Catch exceptions and return error information:This prevents entire benchmark runs from failing due to single task errors.
Clean up resources
Clean up resources
Always cleanup environments, even on error:This prevents resource leaks and hanging containers.
Examples
Minimal agent
Simplest possible agent:Agent with LLM
Agent that uses an LLM:Next steps
Running experiments
Learn how to run your custom agent on CooperBench
Evaluation
Understand how agents are evaluated
Backends
Choose the right execution backend