Skip to main content

Installation

This guide provides detailed installation instructions for setting up the Codenames AI Benchmark, including environment setup, dependency installation, API key configuration, and verification.

System Requirements

Python Version

  • Required: Python 3.8 or higher
  • Recommended: Python 3.10+
Check your Python version:
python3 --version

Operating Systems

  • Linux (Ubuntu, Debian, Fedora, etc.)
  • macOS (10.15+)
  • Windows (with WSL2 or native Python)

Step-by-Step Installation

1

Clone the repository

git clone https://github.com/your-org/code-names-benchmark.git
cd code-names-benchmark
Verify you’re in the correct directory:
ls -la
# Should see: baml_src/, game/, agents/, demo_simple_game.py, etc.
2

Create a virtual environment (recommended)

Using a virtual environment isolates dependencies and prevents conflicts:
# Create virtual environment
python3 -m venv venv

# Activate it
source venv/bin/activate
When activated, your terminal prompt will show (venv) at the beginning.
3

Install dependencies

Install all required packages from requirements.txt:
pip install -r requirements.txt
This installs the following packages:
PackageVersionPurpose
baml-py0.211.2Structured LLM outputs and prompt management
openai≥1.0.0OpenAI, xAI Grok, and DeepSeek clients
anthropic≥0.18.0Claude models
google-generativeai≥0.3.0Gemini models
python-dotenv≥1.0.0Load environment variables from .env
pandas≥2.0.0Benchmark data analysis
numpy≥1.24.0Numerical computations
matplotlib≥3.7.0Visualization generation
seaborn≥0.12.0Statistical visualizations
scipy≥1.10.0Statistical analysis
Verify installation:
pip list | grep baml
# Should show: baml-py 0.211.2
4

Set up environment variables

Copy the example environment file:
cp .env.example .env
The .env.example file contains placeholders for all supported providers:
.env.example
# OpenAI API Key (for GPT-4o, GPT-4o-mini, etc.)
# Get your key at: https://platform.openai.com/api-keys
OPENAI_API_KEY=your_openai_api_key_here

# Anthropic API Key (for Claude models)
# Get your key at: https://console.anthropic.com/settings/keys
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Google Gemini API Key (for Gemini models)
# Get your key at: https://aistudio.google.com/app/apikey
GOOGLE_API_KEY=your_google_api_key_here

# xAI Grok API Key (for Grok models)
# Get your key at: https://console.x.ai/
XAI_API_KEY=your_xai_api_key_here

# DeepSeek API Key (for DeepSeek models)
# Get your key at: https://platform.deepseek.com/api_keys
DEEPSEEK_API_KEY=your_deepseek_api_key_here

# OpenRouter API Key (for accessing multiple models)
# Get your key at: https://openrouter.ai/keys
OPENROUTER_API_KEY=your_openrouter_api_key_here
You only need at least ONE API key to get started. The demo defaults to OpenRouter free models.
5

Obtain API keys

Choose at least one provider and get your API key:
Best for: GPT-5, GPT-4.1, o-series reasoning models
  1. Visit platform.openai.com/api-keys
  2. Log in to your OpenAI account
  3. Click “Create new secret key”
  4. Copy the key (you won’t see it again!)
  5. Add to .env:
OPENAI_API_KEY=sk-...
Pricing: GPT-5.2: 1.75/1Minput,1.75/1M input, 14/1M output (~$0.02-0.10 per game)
OpenAI offers $5 free credits for new accounts.
Best for: Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.1
  1. Visit console.anthropic.com/settings/keys
  2. Log in to your Anthropic account
  3. Click “Create Key”
  4. Copy the key
  5. Add to .env:
ANTHROPIC_API_KEY=sk-ant-...
Pricing: Claude Sonnet 4.5: 3/1Minput,3/1M input, 15/1M output (~$0.05 per game)
Best for: Gemini 2.5 Pro, Gemini 2.5 Flash (fast and cost-effective)
  1. Visit aistudio.google.com/app/apikey
  2. Sign in with your Google account
  3. Click “Create API Key”
  4. Copy the key
  5. Add to .env:
GOOGLE_API_KEY=AIza...
Pricing: Gemini 2.5 Flash: Very low cost (~$0.003 per game)
Gemini has a generous free tier with high rate limits.
Best for: Grok 4, Grok 3 models
  1. Visit console.x.ai
  2. Sign in with your account
  3. Navigate to API keys section
  4. Create a new API key
  5. Add to .env:
XAI_API_KEY=xai-...
Best for: Cost-effective reasoning (DeepSeek V3.2)
  1. Visit platform.deepseek.com/api_keys
  2. Sign up or log in
  3. Click “Create API Key”
  4. Copy the key
  5. Add to .env:
DEEPSEEK_API_KEY=sk-...
Pricing: Extremely cost-effective (~$0.002 per game)
6

Verify installation

Run the test suite to verify everything is working:
# Check Python can import all dependencies
python3 -c "from baml_client import b; from agents.llm import BAMLModel; print('✓ All imports successful')"
Verify API keys are loaded:
python3 -c "from dotenv import load_dotenv; import os; load_dotenv(); print('✓ API keys loaded:', len([k for k in os.environ if 'API_KEY' in k]), 'found')"
If you see ”✓” checkmarks, you’re ready to go!
7

Run verification game

Test your setup with a quick demo:
python3 demo_simple_game.py
Expected output:
======================================================================
  CODENAMES: MULTI-MODEL DEMO
======================================================================

--- Checking API Keys ---
  [OK] Devstral - API key found
  [OK] MIMO V2 Flash - API key found
  ...

--- Generating Game Board ---
[OK] Generated 25 random words for the board
...
If you see [MISSING] errors, double-check your .env file has valid API keys.

Configuration Files

BAML Configuration

The BAML configuration defines LLM clients and prompts:
baml_src/
├── main.baml       # Agent prompts and schemas
└── clients.baml    # LLM provider configurations
If you modify BAML files, regenerate the client:
baml generate

Game Configuration

Customize game parameters in config.py:
config.py
from config import Config

# Standard 25-word game (9 blue, 8 red, 7 neutral, 1 bomb)
config = Config.default()

# Custom board size
large_game = Config.custom_game(board_size=49)
mini_game = Config.custom_game(board_size=9)

Model Configuration

Model-specific settings in model_config.py:
  • Temperature configurations per model
  • Benchmark model selection
  • Display name mappings

Directory Structure

After installation, your directory should look like:
code-names-benchmark/
├── .env                    # Your API keys (DO NOT COMMIT)
├── .env.example            # Template for API keys
├── requirements.txt        # Python dependencies
├── config.py              # Game configuration
├── model_config.py        # Model-specific settings
├── demo_simple_game.py    # Quick demo script
├── benchmark.py           # Benchmark runner
├── baml_src/              # BAML prompt definitions
│   ├── main.baml
│   └── clients.baml
├── baml_client/           # Generated BAML client (auto-generated)
├── game/                  # Core game engine
│   ├── board.py
│   └── state.py
├── agents/                # Agent implementations
│   ├── base.py
│   ├── random_agents.py
│   └── llm/
│       └── baml_agents.py
├── orchestrator/          # Game coordination
│   └── game_runner.py
├── analysis/              # Benchmark analysis
├── utils/                 # Utilities
│   ├── generate_words.py
│   └── words.csv
└── benchmark_results/     # Generated benchmark data

Security Best Practices

Never commit your .env file or expose API keys publicly!

Protect Your API Keys

  1. Add .env to .gitignore:
    .gitignore
    .env
    .env.local
    *.env
    
  2. Use environment-specific files:
    • .env for local development
    • .env.production for production
    • Never version control these files
  3. Set spending limits:
  4. Rotate keys regularly:
    • Generate new keys periodically
    • Revoke old keys after migration

Troubleshooting

Import Errors

Problem: ModuleNotFoundError: No module named 'baml_client' Solution:
baml generate
pip install -r requirements.txt

API Key Issues

Problem: “No API keys found” or authentication errors Solutions:
  1. Verify .env file exists and contains keys
  2. Check for typos in variable names (e.g., OPENAI_API_KEY not OPENAI_KEY)
  3. Ensure no spaces around = in .env file
  4. Verify the API key is valid (test in provider console)
  5. Check you’re using the correct key for the model

Rate Limiting

Problem: “Rate limit exceeded” errors Solutions:
  1. Free tier limits: Wait and retry, or upgrade account
  2. Use different provider: Switch to models with higher limits
  3. Reduce concurrency: Run fewer games simultaneously
  4. Add delays: Implement retry logic with exponential backoff

Slow Performance

Problem: Games take very long to complete Normal behavior: LLM API calls take 2-10 seconds each. A game makes 20-50+ calls. Optimization tips:
  1. Use faster models (Haiku, Flash Lite, Mini, Nano variants)
  2. Enable verbose mode to see progress: verbose=True
  3. Reduce board size for testing: Config.custom_game(board_size=9)
  4. Use verbose=False for production benchmarks

BAML Generation Errors

Problem: Errors when running scripts related to BAML client Solution:
baml generate
This regenerates the client code from baml_src/ definitions.

Updating the Benchmark

To update to the latest version:
# Pull latest changes
git pull origin main

# Update dependencies
pip install -r requirements.txt --upgrade

# Regenerate BAML client
baml generate

# Verify installation
python3 demo_simple_game.py

Next Steps

Quick Start

Run your first game in 5 minutes

Model Selection

Explore all 50+ available models

Run Benchmarks

Evaluate models across multiple games

Configure Games

Customize board size and game rules

Getting Help

If you encounter issues:
  1. Check documentation: Review relevant guides in this documentation
  2. Search issues: Look for similar problems in GitHub Issues
  3. Ask community: Join our Discord/Slack for support
  4. Report bugs: Create a new issue with detailed reproduction steps
Include the following in bug reports:
  • Operating system and Python version
  • Full error message and stack trace
  • Steps to reproduce
  • Contents of your .env (without actual keys!)

Build docs developers (and LLMs) love