Make Index

Overview

The Make Index scripts create and maintain the master player index, collecting basic scoring statistics and calculating True Shooting Percentage (TS%). Two versions exist:

make_index.py - Legacy version with manual scraping
make_index2.py - Modern refactored version with improved error handling

Data Sources

Basketball Reference: Player statistics (totals and per-possession)
NBA API: Player IDs and current roster data
URLs: basketball-reference.com/leagues/NBA_{year}_totals.html and /per_poss.html

Core Functions

pull_bref_data()

Pulls player statistics from Basketball Reference.

totals

boolean

default:"False"

If True, scrapes totals data. If False, scrapes per-possession data.

Returns: pd.DataFrame with columns:

player, url, team, year, G, MP, FGA, FG, 3PA, 3P, FTA, FT, PTS

# From make_index2.py:105
def pull_bref_data(totals=False):
    leagues = "playoffs" if config.PLAYOFFS_MODE else "leagues"
    if totals:
        url_pattern = f"https://www.basketball-reference.com/{leagues}/NBA_{{year}}_totals.html"
    else:
        url_pattern = f"https://www.basketball-reference.com/{leagues}/NBA_{{year}}_per_poss.html"

process_player_ids()

Matches Basketball Reference IDs to NBA API IDs.

pd.DataFrame

required

DataFrame containing player data with URLs

master_df

pd.DataFrame

required

Master index DataFrame with existing ID mappings

Returns: DataFrame with added bref_id, nba_id, and team_id columns

# From make_index2.py:204
def process_player_ids(df, master_df):
    # Extract Basketball Reference IDs
    df['bref_id'] = df['url'].str.split('/', expand=True)[5].str.split('.', expand=True)[0]
    
    # Map IDs to dataframe
    match_dict = dict(zip(master_df['bref_id'], master_df['nba_id']))
    df['nba_id'] = df['bref_id'].map(match_dict)

calculate_true_shooting()

Calculates True Shooting Percentage using the formula: TS% = PTS / (2 * (FGA + 0.44 * FTA)) * 100

pd.DataFrame

required

DataFrame with PTS, FGA, and FTA columns

Returns: DataFrame with added TS% column

# From make_index2.py:269
df['TS%'] = (df['PTS'] / (2 * (df['FGA'] + 0.44 * df['FTA']))) * 100
df.replace([np.inf, -np.inf], 0, inplace=True)
df.loc[df['TS%'] > 150, 'TS%'] = 0  # Clean extreme values

Configuration

Set at the top of make_index2.py:

PLAYOFFS_MODE

boolean

default:"True"

Toggle between playoffs and regular season data

CURRENT_YEAR

integer

default:"2025"

Year to scrape (represents 2024-25 season)

CURRENT_SEASON

string

default:"2024-25"

Season format for NBA API

# From make_index2.py:15-20
class Config:
    PLAYOFFS_MODE = True
    CURRENT_YEAR = 2025
    CURRENT_SEASON = "2024-25"

Output Files

index_master.csv / index_master_ps.csv

CSV

Master player index with ID mappingsColumns: player, url, year, team, bref_id, nba_id, team_id

scoring.csv / scoring_ps.csv

CSV

Per-possession scoring statisticsColumns: Player, TS%, PTS, MP, Tm, G, year, nba_id

totals.csv / totals_ps.csv

CSV

Total scoring statistics with shooting attemptsColumns: Player, TS%, PTS, MP, Tm, G, FTA, FGA, year, nba_id

games.csv / ps_games.csv

CSV

Games played data exported to other modulesColumns: nba_id, Player, year, G

Usage Example

# Set configuration
class Config:
    PLAYOFFS_MODE = False  # Regular season
    CURRENT_YEAR = 2025
    CURRENT_SEASON = "2024-25"

config = Config()

# Run the main pipeline
if __name__ == "__main__":
    main()

Output:

Running in REGULAR SEASON mode
Fetching data from: https://www.basketball-reference.com/leagues/NBA_2025_totals.html
Successfully processed 612 players for 2025 (totals)
Found 15 players without NBA IDs
Fetching player data from NBA API...
Found 12 additional IDs from the NBA API

Key Features

Dynamic header mapping: Automatically detects column positions from Basketball Reference HTML
ID reconciliation: Matches players across Basketball Reference and NBA API
Playoff/regular season toggle: Single ps flag controls data source
Hardcoded ID fallbacks: Manual dictionary for players missing from APIs
TS% calculation: Industry-standard true shooting percentage formula
Data validation: Removes extreme TS% values (>150%) and handles inf/NaN

Data Scripts

Overview

Data Sources

Core Functions

pull_bref_data()

process_player_ids()

calculate_true_shooting()

Configuration

Output Files

Usage Example

Key Features

Build docs developers (and LLMs) love

Data Scripts

​Overview

​Data Sources

​Core Functions

​pull_bref_data()

​process_player_ids()

​calculate_true_shooting()

​Configuration

​Output Files

​Usage Example

​Key Features

Build docs developers (and LLMs) love

Overview

Data Sources

Core Functions

pull_bref_data()

process_player_ids()

calculate_true_shooting()

Configuration

Output Files

Usage Example

Key Features