Installation

This guide provides detailed installation instructions for setting up the Codenames AI Benchmark, including environment setup, dependency installation, API key configuration, and verification.

System Requirements

Python Version

Required: Python 3.8 or higher
Recommended: Python 3.10+

Check your Python version:

python3 --version

Operating Systems

Linux (Ubuntu, Debian, Fedora, etc.)
macOS (10.15+)
Windows (with WSL2 or native Python)

Step-by-Step Installation

Clone the repository

git clone https://github.com/your-org/code-names-benchmark.git
cd code-names-benchmark

Verify you’re in the correct directory:

ls -la
# Should see: baml_src/, game/, agents/, demo_simple_game.py, etc.

Create a virtual environment (recommended)

Using a virtual environment isolates dependencies and prevents conflicts:

# Create virtual environment
python3 -m venv venv

# Activate it
source venv/bin/activate

When activated, your terminal prompt will show (venv) at the beginning.

Install dependencies

Install all required packages from requirements.txt:

pip install -r requirements.txt

This installs the following packages:

Package	Version	Purpose
`baml-py`	0.211.2	Structured LLM outputs and prompt management
`openai`	≥1.0.0	OpenAI, xAI Grok, and DeepSeek clients
`anthropic`	≥0.18.0	Claude models
`google-generativeai`	≥0.3.0	Gemini models
`python-dotenv`	≥1.0.0	Load environment variables from `.env`
`pandas`	≥2.0.0	Benchmark data analysis
`numpy`	≥1.24.0	Numerical computations
`matplotlib`	≥3.7.0	Visualization generation
`seaborn`	≥0.12.0	Statistical visualizations
`scipy`	≥1.10.0	Statistical analysis

Verify installation:

pip list | grep baml
# Should show: baml-py 0.211.2

Set up environment variables

Copy the example environment file:

cp .env.example .env

The .env.example file contains placeholders for all supported providers:

.env.example

# OpenAI API Key (for GPT-4o, GPT-4o-mini, etc.)
# Get your key at: https://platform.openai.com/api-keys
OPENAI_API_KEY=your_openai_api_key_here

# Anthropic API Key (for Claude models)
# Get your key at: https://console.anthropic.com/settings/keys
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Google Gemini API Key (for Gemini models)
# Get your key at: https://aistudio.google.com/app/apikey
GOOGLE_API_KEY=your_google_api_key_here

# xAI Grok API Key (for Grok models)
# Get your key at: https://console.x.ai/
XAI_API_KEY=your_xai_api_key_here

# DeepSeek API Key (for DeepSeek models)
# Get your key at: https://platform.deepseek.com/api_keys
DEEPSEEK_API_KEY=your_deepseek_api_key_here

# OpenRouter API Key (for accessing multiple models)
# Get your key at: https://openrouter.ai/keys
OPENROUTER_API_KEY=your_openrouter_api_key_here

You only need at least ONE API key to get started. The demo defaults to OpenRouter free models.

Obtain API keys

Choose at least one provider and get your API key:

OpenRouter (Recommended for Testing)

Best for: Free experimentation with multiple models

Visit openrouter.ai/keys
Sign up or log in
Click “Create Key”
Copy the key and add to .env:

OPENROUTER_API_KEY=sk-or-v1-...

Free models available:

Devstral (Mistral)
MIMO V2 Flash
Nemotron Nano 12B
DeepSeek R1T Chimera
GLM 4.5 Air
Llama 3.3 70B
OLMo 3.1 32B

OpenAI

Best for: GPT-5, GPT-4.1, o-series reasoning models

Visit platform.openai.com/api-keys
Log in to your OpenAI account
Click “Create new secret key”
Copy the key (you won’t see it again!)
Add to .env:

OPENAI_API_KEY=sk-...

Pricing: GPT-5.2:

1.75/1M input,

14/1M output (~$0.02-0.10 per game)

OpenAI offers $5 free credits for new accounts.

Anthropic

Best for: Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.1

Visit console.anthropic.com/settings/keys
Log in to your Anthropic account
Click “Create Key”
Copy the key
Add to .env:

ANTHROPIC_API_KEY=sk-ant-...

Pricing: Claude Sonnet 4.5:

3/1M input,

15/1M output (~$0.05 per game)

Google Gemini

Best for: Gemini 2.5 Pro, Gemini 2.5 Flash (fast and cost-effective)

Visit aistudio.google.com/app/apikey
Sign in with your Google account
Click “Create API Key”
Copy the key
Add to .env:

GOOGLE_API_KEY=AIza...

Pricing: Gemini 2.5 Flash: Very low cost (~$0.003 per game)

Gemini has a generous free tier with high rate limits.

xAI Grok

Best for: Grok 4, Grok 3 models

Visit console.x.ai
Sign in with your account
Navigate to API keys section
Create a new API key
Add to .env:

XAI_API_KEY=xai-...

DeepSeek

Best for: Cost-effective reasoning (DeepSeek V3.2)

Visit platform.deepseek.com/api_keys
Sign up or log in
Click “Create API Key”
Copy the key
Add to .env:

DEEPSEEK_API_KEY=sk-...

Pricing: Extremely cost-effective (~$0.002 per game)

Verify installation

Run the test suite to verify everything is working:

# Check Python can import all dependencies
python3 -c "from baml_client import b; from agents.llm import BAMLModel; print('✓ All imports successful')"

Verify API keys are loaded:

python3 -c "from dotenv import load_dotenv; import os; load_dotenv(); print('✓ API keys loaded:', len([k for k in os.environ if 'API_KEY' in k]), 'found')"

If you see ”✓” checkmarks, you’re ready to go!

Run verification game

Test your setup with a quick demo:

python3 demo_simple_game.py

Expected output:

======================================================================
  CODENAMES: MULTI-MODEL DEMO
======================================================================

--- Checking API Keys ---
  [OK] Devstral - API key found
  [OK] MIMO V2 Flash - API key found
  ...

--- Generating Game Board ---
[OK] Generated 25 random words for the board
...

If you see [MISSING] errors, double-check your .env file has valid API keys.

Configuration Files

BAML Configuration

The BAML configuration defines LLM clients and prompts:

baml_src/
├── main.baml       # Agent prompts and schemas
└── clients.baml    # LLM provider configurations

If you modify BAML files, regenerate the client:

baml generate

Game Configuration

Customize game parameters in config.py:

config.py

from config import Config

# Standard 25-word game (9 blue, 8 red, 7 neutral, 1 bomb)
config = Config.default()

# Custom board size
large_game = Config.custom_game(board_size=49)
mini_game = Config.custom_game(board_size=9)

Model Configuration

Model-specific settings in model_config.py:

Temperature configurations per model
Benchmark model selection
Display name mappings

Directory Structure

After installation, your directory should look like:

code-names-benchmark/
├── .env                    # Your API keys (DO NOT COMMIT)
├── .env.example            # Template for API keys
├── requirements.txt        # Python dependencies
├── config.py              # Game configuration
├── model_config.py        # Model-specific settings
├── demo_simple_game.py    # Quick demo script
├── benchmark.py           # Benchmark runner
├── baml_src/              # BAML prompt definitions
│   ├── main.baml
│   └── clients.baml
├── baml_client/           # Generated BAML client (auto-generated)
├── game/                  # Core game engine
│   ├── board.py
│   └── state.py
├── agents/                # Agent implementations
│   ├── base.py
│   ├── random_agents.py
│   └── llm/
│       └── baml_agents.py
├── orchestrator/          # Game coordination
│   └── game_runner.py
├── analysis/              # Benchmark analysis
├── utils/                 # Utilities
│   ├── generate_words.py
│   └── words.csv
└── benchmark_results/     # Generated benchmark data

Security Best Practices

Never commit your .env file or expose API keys publicly!

Protect Your API Keys

Add .env to .gitignore:
.gitignore
```
.env
.env.local
*.env
```
Use environment-specific files:
- .env for local development
- .env.production for production
- Never version control these files
Set spending limits:
- OpenAI: platform.openai.com/account/billing/limits
- Anthropic: console.anthropic.com/settings/billing
- Google: cloud.google.com/billing/docs/how-to/budgets
Rotate keys regularly:
- Generate new keys periodically
- Revoke old keys after migration

Troubleshooting

Import Errors

Problem: ModuleNotFoundError: No module named 'baml_client' Solution:

baml generate
pip install -r requirements.txt

API Key Issues

Problem: “No API keys found” or authentication errors Solutions:

Verify .env file exists and contains keys
Check for typos in variable names (e.g., OPENAI_API_KEY not OPENAI_KEY)
Ensure no spaces around = in .env file
Verify the API key is valid (test in provider console)
Check you’re using the correct key for the model

Rate Limiting

Problem: “Rate limit exceeded” errors Solutions:

Free tier limits: Wait and retry, or upgrade account
Use different provider: Switch to models with higher limits
Reduce concurrency: Run fewer games simultaneously
Add delays: Implement retry logic with exponential backoff

Slow Performance

Problem: Games take very long to complete Normal behavior: LLM API calls take 2-10 seconds each. A game makes 20-50+ calls. Optimization tips:

Use faster models (Haiku, Flash Lite, Mini, Nano variants)
Enable verbose mode to see progress: verbose=True
Reduce board size for testing: Config.custom_game(board_size=9)
Use verbose=False for production benchmarks

BAML Generation Errors

Problem: Errors when running scripts related to BAML client Solution:

baml generate

This regenerates the client code from baml_src/ definitions.

Updating the Benchmark

To update to the latest version:

# Pull latest changes
git pull origin main

# Update dependencies
pip install -r requirements.txt --upgrade

# Regenerate BAML client
baml generate

# Verify installation
python3 demo_simple_game.py

Next Steps

Quick Start

Run your first game in 5 minutes

Model Selection

Explore all 50+ available models

Run Benchmarks

Evaluate models across multiple games

Configure Games

Customize board size and game rules

Getting Help

If you encounter issues:

Check documentation: Review relevant guides in this documentation
Search issues: Look for similar problems in GitHub Issues
Ask community: Join our Discord/Slack for support
Report bugs: Create a new issue with detailed reproduction steps

Include the following in bug reports:

Operating system and Python version
Full error message and stack trace
Steps to reproduce
Contents of your .env (without actual keys!)

Get Started

Core Concepts

Guides

Advanced

Installation

Installation

System Requirements

Python Version

Operating Systems

Step-by-Step Installation

Configuration Files

BAML Configuration

Game Configuration

Model Configuration

Directory Structure

Security Best Practices

Protect Your API Keys

Troubleshooting

Import Errors

API Key Issues

Rate Limiting

Slow Performance

BAML Generation Errors

Updating the Benchmark

Next Steps

Quick Start

Model Selection

Run Benchmarks

Configure Games

Getting Help

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Installation

​System Requirements

​Python Version

​Operating Systems

​Step-by-Step Installation

​Configuration Files

​BAML Configuration

​Game Configuration

​Model Configuration

​Directory Structure

​Security Best Practices

​Protect Your API Keys

​Troubleshooting

​Import Errors

​API Key Issues

​Rate Limiting

​Slow Performance

​BAML Generation Errors

​Updating the Benchmark

​Next Steps

Quick Start

Model Selection

Run Benchmarks

Configure Games

​Getting Help

Build docs developers (and LLMs) love

Installation

System Requirements

Python Version

Operating Systems

Step-by-Step Installation

Configuration Files

BAML Configuration

Game Configuration

Model Configuration

Directory Structure

Security Best Practices

Protect Your API Keys

Troubleshooting

Import Errors

API Key Issues

Rate Limiting

Slow Performance

BAML Generation Errors

Updating the Benchmark

Next Steps

Getting Help