Overview
The Codenames AI Benchmark provides abstract base classes for creating custom agents. You can implement your own hint givers (spymasters) and guessers (field operatives) using any strategy or LLM provider.Agent Architecture
All agents inherit from two base classes defined inagents/base.py:
- HintGiver: Sees all card colors, provides hints
- Guesser: Only sees board words, makes guesses based on hints
Base Classes
HintGiver Base Class
TheHintGiver abstract class defines the interface for spymaster agents:
agents/base.py
Guesser Base Class
TheGuesser abstract class defines the interface for field operative agents:
agents/base.py
Creating a Simple Agent
Here’s a complete example of a simple random agent implementation:agents/random_agents.py
LLM-Based Agents with BAML
For LLM-based agents, use the BAML framework which handles prompt templating and structured outputs:agents/llm/baml_agents.py
Agent Guidelines
Hint Validation
- Return single-word hints (no spaces)
- Provide positive integer counts
- Avoid board words in hints
- Consider bomb avoidance
Guess Strategy
- Return 1 to (hint_count + 1) guesses
- Order by confidence (best first)
- Only guess unrevealed words
- Empty list passes the turn
State Management
- Implement
reset()for reusable agents - Use
process_result()for learning - Track history in instance variables
- Clean up between games
Error Handling
- Handle LLM API failures gracefully
- Validate input parameters
- Log errors for debugging
- Return valid fallback responses
Testing Your Agent
Create a simple test script to verify your agent:test_agent.py
Using Custom Agents in Benchmarks
Once your agent is implemented, use it in the orchestrator:run_custom_benchmark.py
Advanced Techniques
Stateful Agents
Track game state for adaptive strategies:Multi-Model Ensembles
Combine multiple LLMs for better performance:Next Steps
Prompt Engineering
Optimize your LLM prompts for better performance
Analysis Metrics
Measure and analyze agent performance