Overview
The analysis pipeline provides a unified interface for running all benchmark metrics and analysis functions. It processes game results and generates comprehensive statistics including win rates, Elo ratings, efficiency metrics, error patterns, and more.Classes
AnalysisResult
Complete analysis results containing all computed metrics and statistics.Identifier for the benchmark being analyzed
Team combination statistics with win rates and game counts
Model performance aggregated by role (hint giver/guesser)
Individual model performance metrics
Elo ratings for each model by role
Wilson confidence intervals for win rates
First mover advantage statistics
Game momentum and competitiveness metrics
Aggregated momentum statistics
Best performing hint givers
Best performing guessers
Dominant team combinations
Model synergy analysis (best pairings)
Hint giver efficiency metrics
Detailed guesser performance metrics
Model versatility across roles
Error counts by model and type
Detailed error pattern analysis
Hint word patterns and statistics
Game efficiency by team combination
Efficiency aggregated by model
Head-to-head matchup matrix for hint givers (win rates, game counts)
Head-to-head matchup matrix for guessers (win rates, game counts)
Functions
run_pipeline()
Run the complete analysis pipeline on benchmark results.Path to benchmark results directory containing game snapshots
AnalysisResult containing all computed metrics and statistics.
Usage Example
Pipeline Stages
The pipeline executes the following analysis stages:1. Data Loading
Loads and parses benchmark results from disk.2. Win Rate Analysis
- Team combination statistics
- Role-based performance
- First mover advantage
- Best hint givers and guessers
- Dominant combinations
- Model synergies
3. Elo Rating Computation
Calculates Elo ratings for each model in hint giver and guesser roles.4. Confidence Intervals
Computes Wilson score confidence intervals for win rates.5. Momentum Analysis
- Game momentum tracking
- Lead changes
- Comeback statistics
- Competitiveness metrics
6. Role-Specific Metrics
- Hint efficiency (correct guesses per hint)
- Guesser performance (accuracy, bomb rate)
- Role versatility scores
- Head-to-head matchup matrices
7. Error Analysis
- Error counts by model and type
- Bomb hit contexts
- Invalid guess patterns
- Wrong guess color distribution
8. Hint Pattern Analysis
- Hint word frequency
- Hint count distribution
- Success rates by hint count
- Hint creativity ratio
- Average efficiency
9. Efficiency Metrics
- Average turns per game
- Wins per turn
- Efficiency by model
- Turn-to-win ratios
Performance Considerations
The pipeline processes all games in memory. For very large benchmarks (10,000+ games), consider:
- Processing in batches
- Using chunked analysis
- Increasing available memory
Output Format
All DataFrames use consistent column naming:- model: Model identifier
- role: “hint_giver” or “guesser”
- team: “blue” or “red”
- games_played: Number of games
- wins: Number of wins
- win_rate: Win percentage (0-1)
Example Output Structure
Visualization Integration
TheAnalysisResult is designed to work seamlessly with the visualization module:
Related
- Metrics Overview - Individual metric functions
- Visualization - Generate charts and plots
- BenchmarkData - Data models and loading