Quick Start
Get up and running with the Codenames AI Benchmark in minutes. This guide will have you watching AI models compete in a strategic word game.Time to complete: ~5 minutesWhat you’ll do: Install dependencies, set up one API key, and run a demo game
Prerequisites
- Python 3.8 or higher
- pip (Python package manager)
- At least one API key from: OpenAI, Anthropic, Google, xAI, DeepSeek, or OpenRouter
Installation & Setup
Install dependencies
baml-py==0.211.2- Structured LLM outputsopenai>=1.0.0- OpenAI, xAI, and DeepSeek clientanthropic>=0.18.0- Claude modelsgoogle-generativeai>=0.3.0- Gemini modelspython-dotenv>=1.0.0- Environment variable loading- Analysis tools: pandas, numpy, matplotlib, seaborn, scipy
Configure API keys
Copy the example environment file and add your API key:Edit Google
.env and add at least one API key:OpenRouter
Many free models - best for testing
OpenAI
GPT-5, GPT-4.1, reasoning models
Anthropic
Claude Sonnet 4.5, Haiku, Opus
Gemini 2.5 Pro, Flash variants
xAI
Grok 4, Grok 3 models
DeepSeek
DeepSeek V3.2 Chat, Reasoner
Run the demo game
The demo is pre-configured to use free OpenRouter models:You’ll see verbose output showing:
- Game setup and board state (25 words)
- Each hint given by spymasters
- Each guess made by field operatives
- Turn-by-turn results
- Final game outcome and statistics
Games typically take 1-2 minutes to complete as AI models think through their moves.
Understanding the Output
Game Setup Phase
Turn-by-Turn Gameplay
Final Results
Customize Your Game
Edit thePlayers class in demo_simple_game.py to try different models:
demo_simple_game.py
Available Free Models (OpenRouter)
Frontier Models
Run Programmatically
Create your own game script:custom_game.py
Troubleshooting
"No API keys found" error
"No API keys found" error
- Make sure you copied
.env.exampleto.env - Verify your API key is valid and not expired
- Check that the key name matches exactly (e.g.,
OPENROUTER_API_KEY) - Ensure no extra spaces around the
=sign
"ModuleNotFoundError" error
"ModuleNotFoundError" error
Run
pip install -r requirements.txt to install all dependencies.If using a virtual environment, make sure it’s activated before installing."Rate limit exceeded" error
"Rate limit exceeded" error
- Free tier limits: OpenRouter free models have usage limits
- Solution 1: Wait a few minutes and try again
- Solution 2: Switch to a different provider
- Solution 3: Upgrade to a paid tier for higher limits
Game runs very slowly
Game runs very slowly
This is normal! LLM API calls take 2-10 seconds each. A complete game makes 20-50+ API calls.Tips:
- Use
verbose=Trueto see progress - Try faster models (Haiku, Flash Lite, Mini variants)
- Be patient - strategic thinking takes time!
BAML generation errors
BAML generation errors
If you see errors about BAML client code:This regenerates the client from
baml_src/ definitions.Next Steps
Detailed Installation
Complete setup guide with verification steps
Run Benchmarks
Evaluate models across multiple games
Configure Games
Customize board size, rules, and parameters
Edit Prompts
Modify AI agent behavior with BAML