Installation
This guide provides detailed installation instructions for setting up the Codenames AI Benchmark, including environment setup, dependency installation, API key configuration, and verification.System Requirements
Python Version
- Required: Python 3.8 or higher
- Recommended: Python 3.10+
Operating Systems
- Linux (Ubuntu, Debian, Fedora, etc.)
- macOS (10.15+)
- Windows (with WSL2 or native Python)
Step-by-Step Installation
Create a virtual environment (recommended)
Using a virtual environment isolates dependencies and prevents conflicts:
When activated, your terminal prompt will show
(venv) at the beginning.Install dependencies
Install all required packages from This installs the following packages:
Verify installation:
requirements.txt:| Package | Version | Purpose |
|---|---|---|
baml-py | 0.211.2 | Structured LLM outputs and prompt management |
openai | ≥1.0.0 | OpenAI, xAI Grok, and DeepSeek clients |
anthropic | ≥0.18.0 | Claude models |
google-generativeai | ≥0.3.0 | Gemini models |
python-dotenv | ≥1.0.0 | Load environment variables from .env |
pandas | ≥2.0.0 | Benchmark data analysis |
numpy | ≥1.24.0 | Numerical computations |
matplotlib | ≥3.7.0 | Visualization generation |
seaborn | ≥0.12.0 | Statistical visualizations |
scipy | ≥1.10.0 | Statistical analysis |
Set up environment variables
Copy the example environment file:The
.env.example file contains placeholders for all supported providers:.env.example
Obtain API keys
Choose at least one provider and get your API key:
OpenRouter (Recommended for Testing)
OpenRouter (Recommended for Testing)
Best for: Free experimentation with multiple modelsFree models available:
- Visit openrouter.ai/keys
- Sign up or log in
- Click “Create Key”
- Copy the key and add to
.env:
- Devstral (Mistral)
- MIMO V2 Flash
- Nemotron Nano 12B
- DeepSeek R1T Chimera
- GLM 4.5 Air
- Llama 3.3 70B
- OLMo 3.1 32B
OpenAI
OpenAI
Best for: GPT-5, GPT-4.1, o-series reasoning modelsPricing: GPT-5.2: 14/1M output (~$0.02-0.10 per game)
- Visit platform.openai.com/api-keys
- Log in to your OpenAI account
- Click “Create new secret key”
- Copy the key (you won’t see it again!)
- Add to
.env:
OpenAI offers $5 free credits for new accounts.
Anthropic
Anthropic
Best for: Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.1Pricing: Claude Sonnet 4.5: 15/1M output (~$0.05 per game)
- Visit console.anthropic.com/settings/keys
- Log in to your Anthropic account
- Click “Create Key”
- Copy the key
- Add to
.env:
Google Gemini
Google Gemini
Best for: Gemini 2.5 Pro, Gemini 2.5 Flash (fast and cost-effective)Pricing: Gemini 2.5 Flash: Very low cost (~$0.003 per game)
- Visit aistudio.google.com/app/apikey
- Sign in with your Google account
- Click “Create API Key”
- Copy the key
- Add to
.env:
Gemini has a generous free tier with high rate limits.
xAI Grok
xAI Grok
Best for: Grok 4, Grok 3 models
- Visit console.x.ai
- Sign in with your account
- Navigate to API keys section
- Create a new API key
- Add to
.env:
DeepSeek
DeepSeek
Best for: Cost-effective reasoning (DeepSeek V3.2)Pricing: Extremely cost-effective (~$0.002 per game)
- Visit platform.deepseek.com/api_keys
- Sign up or log in
- Click “Create API Key”
- Copy the key
- Add to
.env:
Verify installation
Run the test suite to verify everything is working:Verify API keys are loaded:
If you see ”✓” checkmarks, you’re ready to go!
Configuration Files
BAML Configuration
The BAML configuration defines LLM clients and prompts:Game Configuration
Customize game parameters inconfig.py:
config.py
Model Configuration
Model-specific settings inmodel_config.py:
- Temperature configurations per model
- Benchmark model selection
- Display name mappings
Directory Structure
After installation, your directory should look like:Security Best Practices
Protect Your API Keys
-
Add
.envto.gitignore:.gitignore -
Use environment-specific files:
.envfor local development.env.productionfor production- Never version control these files
- Set spending limits:
-
Rotate keys regularly:
- Generate new keys periodically
- Revoke old keys after migration
Troubleshooting
Import Errors
Problem:ModuleNotFoundError: No module named 'baml_client'
Solution:
API Key Issues
Problem: “No API keys found” or authentication errors Solutions:- Verify
.envfile exists and contains keys - Check for typos in variable names (e.g.,
OPENAI_API_KEYnotOPENAI_KEY) - Ensure no spaces around
=in.envfile - Verify the API key is valid (test in provider console)
- Check you’re using the correct key for the model
Rate Limiting
Problem: “Rate limit exceeded” errors Solutions:- Free tier limits: Wait and retry, or upgrade account
- Use different provider: Switch to models with higher limits
- Reduce concurrency: Run fewer games simultaneously
- Add delays: Implement retry logic with exponential backoff
Slow Performance
Problem: Games take very long to complete Normal behavior: LLM API calls take 2-10 seconds each. A game makes 20-50+ calls. Optimization tips:- Use faster models (Haiku, Flash Lite, Mini, Nano variants)
- Enable verbose mode to see progress:
verbose=True - Reduce board size for testing:
Config.custom_game(board_size=9) - Use
verbose=Falsefor production benchmarks
BAML Generation Errors
Problem: Errors when running scripts related to BAML client Solution:baml_src/ definitions.
Updating the Benchmark
To update to the latest version:Next Steps
Quick Start
Run your first game in 5 minutes
Model Selection
Explore all 50+ available models
Run Benchmarks
Evaluate models across multiple games
Configure Games
Customize board size and game rules
Getting Help
If you encounter issues:- Check documentation: Review relevant guides in this documentation
- Search issues: Look for similar problems in GitHub Issues
- Ask community: Join our Discord/Slack for support
- Report bugs: Create a new issue with detailed reproduction steps
- Operating system and Python version
- Full error message and stack trace
- Steps to reproduce
- Contents of your
.env(without actual keys!)