Overview
The Make Index scripts create and maintain the master player index, collecting basic scoring statistics and calculating True Shooting Percentage (TS%). Two versions exist:- make_index.py - Legacy version with manual scraping
- make_index2.py - Modern refactored version with improved error handling
Data Sources
- Basketball Reference: Player statistics (totals and per-possession)
- NBA API: Player IDs and current roster data
- URLs:
basketball-reference.com/leagues/NBA_{year}_totals.htmland/per_poss.html
Core Functions
pull_bref_data()
Pulls player statistics from Basketball Reference.If True, scrapes totals data. If False, scrapes per-possession data.
pd.DataFrame with columns:
player,url,team,year,G,MP,FGA,FG,3PA,3P,FTA,FT,PTS
process_player_ids()
Matches Basketball Reference IDs to NBA API IDs.DataFrame containing player data with URLs
Master index DataFrame with existing ID mappings
bref_id, nba_id, and team_id columns
calculate_true_shooting()
Calculates True Shooting Percentage using the formula:TS% = PTS / (2 * (FGA + 0.44 * FTA)) * 100
DataFrame with PTS, FGA, and FTA columns
TS% column
Configuration
Set at the top ofmake_index2.py:
Toggle between playoffs and regular season data
Year to scrape (represents 2024-25 season)
Season format for NBA API
Output Files
Master player index with ID mappingsColumns:
player, url, year, team, bref_id, nba_id, team_idPer-possession scoring statisticsColumns:
Player, TS%, PTS, MP, Tm, G, year, nba_idTotal scoring statistics with shooting attemptsColumns:
Player, TS%, PTS, MP, Tm, G, FTA, FGA, year, nba_idGames played data exported to other modulesColumns:
nba_id, Player, year, GUsage Example
Key Features
- Dynamic header mapping: Automatically detects column positions from Basketball Reference HTML
- ID reconciliation: Matches players across Basketball Reference and NBA API
- Playoff/regular season toggle: Single
psflag controls data source - Hardcoded ID fallbacks: Manual dictionary for players missing from APIs
- TS% calculation: Industry-standard true shooting percentage formula
- Data validation: Removes extreme TS% values (>150%) and handles inf/NaN