Overview
The metrics module provides specialized functions for computing various performance metrics from benchmark data. Metrics are organized into focused modules: win rates, Elo ratings, confidence intervals, efficiency, errors, hints, momentum, and role-specific analysis.Win Rate Metrics
team_combo_stats()
Return normalized team combination statistics.Loaded benchmark data
blue_hint_giver, blue_guesser, red_hint_giver, red_guesser, games_played, blue_wins, red_wins, blue_win_rate, avg_turns
best_hint_givers()
Identify top performing hint givers.best_guessers()
Identify top performing guessers.first_mover_advantage()
Calculate first mover advantage statistics.overall_blue_win_rate: Blue team win percentageoverall_red_win_rate: Red team win percentageblue_advantage: Difference from 50%mirror_match_blue_rate: Blue win rate in mirror matchesmirror_match_count: Number of mirror matchestotal_games: Total games analyzed
Elo Rating Metrics
compute_elo()
Calculate Elo ratings for models in each role.Loaded benchmark data
Elo K-factor (higher = more volatile ratings)
Starting Elo rating for all models
model, elo_hint_giver, elo_guesser, elo_combined, elo_best_role
Confidence Interval Metrics
wilson_ci()
Calculate Wilson score confidence intervals for win rates.Model performance DataFrame
Confidence level (0.95 = 95% CI)
model, role, team, win_rate, ci_lower, ci_upper, ci_width, sample_size, wins
pairwise_significance()
Test statistical significance between two models using chi-squared test.Efficiency Metrics
game_efficiency()
Calculate efficiency metrics for team combinations.blue_hint_giver, blue_guesser, red_hint_giver, red_guesser, games_played, avg_turns, blue_wins, red_wins, blue_win_rate, blue_efficiency, red_efficiency, total_turns
efficiency_by_model()
Aggregate efficiency metrics by model.Error Metrics
error_patterns()
Analyze error patterns across games.bomb_hits_by_model: Bomb hits per modelbomb_contexts: Detailed bomb hit contextsinvalid_by_type: Invalid guesses by type (offboard, revealed, other)wrong_guess_colors: Wrong guesses by colortotal_errors_by_model: Total errors per model
error_summary()
Create summary DataFrame of errors.model, bomb_hits, invalid_offboard, invalid_revealed, invalid_other, total_errors
Hint Pattern Metrics
hint_patterns()
Analyze hint word patterns and statistics.total_hints: Total hints givenunique_hints: Number of unique hint wordscreativity_ratio: Ratio of unique to total hintsavg_hint_length: Average hint word lengthavg_hint_count: Average hint count (number)overall_success_rate: Percentage of hints with ≥1 correct guessperfect_hint_rate: Percentage of hints achieving promised countavg_efficiency: Average correct guesses per promisedhint_count_distribution: Histogram of hint countsmost_common_hints: Top 15 most used hint wordssuccess_by_count: Success rates grouped by hint count
Momentum Metrics
game_momentum()
Analyze game momentum and competitiveness.game_id, winner, total_turns, lead_changes, was_comeback, deficit_overcome, max_blue_lead, max_red_lead, competitiveness
momentum_summary()
Aggregate momentum statistics.Role-Specific Metrics
hint_efficiency()
Calculate hint giver efficiency metrics.model, team, hints_given, avg_hint_count, guess_yield, efficiency, hint_success_rate, risk_profile, overcommit_rate, ambiguity_rate, win_rate
Risk profiles:
- aggressive: avg_hint_count > 2.5
- balanced: 1.5 ≤ avg_hint_count ≤ 2.5
- conservative: avg_hint_count < 1.5
guesser_performance()
Calculate detailed guesser performance metrics.model, team, games_played, first_guess_accuracy, overall_accuracy, bomb_rate, bomb_hits, invalid_rate, invalid_breakdown, guesses_per_turn, empty_turn_rate, risk_adjusted_accuracy, win_rate
role_versatility()
Calculate model versatility across roles.model, hint_giver_win_rate, hint_giver_games, guesser_win_rate, guesser_games, versatility_score, best_role, role_gap, combined_win_rate, total_games
matchup_matrix()
Create head-to-head matchup matrix for a role.Role to analyze: “hint_giver” or “guesser”
Usage Examples
Computing Multiple Metrics
Analyzing Hint Efficiency
Error Analysis
Related
- Analysis Pipeline - Orchestrates all metrics
- Visualization - Generate charts from metrics
- BenchmarkData - Data models