Quick Start

Installation

Clone the Repository

Clone the project to your local machine:

git clone <repository-url>
cd workspace/source

Install Dependencies

Install required Python packages using pip:

pip install -r requirements.txt

Key dependencies:

pandas==1.5.3 - Data manipulation
requests==2.32.3 - HTTP requests
beautifulsoup4==4.12.3 - Web scraping
nba_api==1.6.1 - NBA.com API wrapper
plotly==5.23.0 - Visualization (optional)

Verify Installation

Test that all packages are installed correctly:

import pandas as pd
import requests
from bs4 import BeautifulSoup
from nba_api.stats.endpoints import commonallplayers

print("All dependencies installed successfully!")

Your First Data Collection

Let’s collect hustle statistics for the 2024-25 season.

Running the Hustle Stats Script

import pandas as pd
import requests

def get_hustle(year, ps=False):
    """Fetch hustle statistics from NBA.com Stats API.
    
    Args:
        year: Season ending year (e.g., 2025 for 2024-25 season)
        ps: Boolean, True for playoffs, False for regular season
    """
    stype = "Playoffs" if ps else "Regular%20Season"
    season = str(year-1) + '-' + str(year)[-2:]
    
    # NBA.com Stats API endpoint
    url = (
        'https://stats.nba.com/stats/leaguehustlestatsplayer'
        '?College=&Conference=&Country=&DateFrom=&DateTo=&Division='
        '&DraftPick=&DraftYear=&GameScope=&Height=&ISTRound='
        '&LastNGames=0&LeagueID=00&Location=&Month=0'
        '&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N'
        f'&PerMode=Totals&PlayerExperience=&PlayerPosition='
        f'&PlusMinus=N&Rank=N&Season={season}&SeasonSegment='
        f'&SeasonType={stype}&TeamID=0&VsConference=&VsDivision=&Weight='
    )
    
    # Required headers for NBA.com API
    headers = {
        "Host": "stats.nba.com",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0",
        "Accept": "application/json, text/plain, */*",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
        "Referer": "https://stats.nba.com/"
    }
    
    # Make the request
    response = requests.get(url, headers=headers)
    json_data = response.json()
    
    # Extract data from JSON response
    data = json_data["resultSets"][0]["rowSet"]
    columns = json_data["resultSets"][0]["headers"]
    
    # Create DataFrame
    df = pd.DataFrame(data, columns=columns)
    df['year'] = year
    
    return df

# Collect 2024-25 regular season hustle stats
df = get_hustle(2025, ps=False)
print(df.head())
print(f"\nCollected {len(df)} players")
print(f"\nColumns: {df.columns.tolist()}")

# Save to CSV
df.to_csv('hustle_2025.csv', index=False)

You’ve successfully collected your first dataset! The CSV file contains hustle metrics like deflections, charges drawn, screen assists, and loose balls recovered.

Understanding the Data

Data Structure

Each script outputs CSV files with a consistent structure:

Field	Description	Type
`PLAYER_ID`	Unique NBA player identifier	Integer
`PLAYER_NAME`	Player’s full name	String
`TEAM_ABBREVIATION`	Team acronym (e.g., LAL, BOS)	String
`GP`	Games played	Integer
`year`	Season ending year	Integer
[stat columns]	Various statistics	Float/Integer

Querying the Data

Once you have CSV files, you can query them with pandas:

import pandas as pd

# Load the data
df = pd.read_csv('hustle_2025.csv')

# Find players with most deflections (min 20 games)
top_deflections = df[df.GP >= 20].nlargest(10, 'DEFLECTIONS')
print(top_deflections[['PLAYER_NAME', 'TEAM_ABBREVIATION', 'GP', 'DEFLECTIONS']])

Running Other Data Collection Scripts

The platform includes scripts for various data types. Here are common examples:

Defense Statistics

Collect opponent shooting data and rim protection metrics:

python defense.py

This generates:

dfg.csv - Overall opponent field goal percentage
rimdfg.csv - Rim defense (shots within 6 feet)
rim_acc.csv - At-rim accuracy allowed
rimfreq.csv - At-rim shot frequency faced

Player Shooting by Defender Distance

Collect shooting splits based on closest defender:

python player_shooting.py

Outputs four CSV files:

very_tight.csv - 0-2 feet defender distance
tight.csv - 2-4 feet
open.csv - 4-6 feet
wide_open.csv - 6+ feet

Passing & Playmaking

Collect comprehensive passing statistics:

python passing.py

The passing script merges data from pbpstats.com API and NBA.com tracking endpoints to provide complete playmaking metrics.

Dribble-Based Shooting

Analyze shooting by number of dribbles before the shot:

python dribble.py

Creates dribble shooting splits (0, 1, 2, 3-6, 7+ dribbles) for both:

Overall shooting (dribbleshot.csv)
Catch & shoot vs. pull-ups (jumpdribble.csv)

Working with Master Files

Master CSV files contain all seasons combined for easy multi-year analysis:

import pandas as pd

# Load master file with all seasons
all_passing = pd.read_csv('passing.csv')

# Filter to specific player across all years
luka_passing = all_passing[
    all_passing['Name'].str.contains('Luka Doncic', case=False)
]

print(luka_passing[['Name', 'year', 'Assists', 'Potential Assists', 
                     'High Value Assist %', 'on-ball-time%']])

Directory Structure

Understand where data is stored:

workspace/source/
├── requirements.txt           # Python dependencies
├── defense.py                # Defense data collection
├── hustle.py                 # Hustle stats collection  
├── passing.py                # Passing data collection
├── player_shooting.py        # Shooting by defender distance
├── dribble.py                # Dribble-based shooting
├── make_index.py             # Player/team ID mapping
│
├── 2024/                     # Year-specific directories
│   ├── defense/
│   │   ├── dfg.csv
│   │   ├── rimdfg.csv
│   │   └── ...
│   ├── player_shooting/
│   │   ├── very_tight.csv
│   │   ├── tight.csv
│   │   └── ...
│   └── playoffs/
│       └── ...
│
├── hustle.csv                # Master files (all years)
├── passing.csv
├── defense_master.csv
└── index_master.csv          # Player ID mappings

Rate Limiting & Best Practices

Always respect API rate limits to avoid being blocked:

NBA.com Stats API: Add 1-3 second delays between requests
Basketball Reference: Use 2-3 second delays and proper User-Agent headers
pbpstats.com: Implement 3-second delays for team-level loops

Example rate limiting:

import time

for year in range(2014, 2026):
    df = get_hustle(year)
    df.to_csv(f'{year}/hustle.csv', index=False)
    
    # Delay between requests
    time.sleep(3)
    print(f'Completed {year}')

Next Steps

Explore Data Sources

Learn about the three data sources and what each provides

Data Schema Reference

Browse complete field definitions for all datasets

Player Statistics

Explore player-level data collections

API Scripts

Full documentation of all collection scripts

Get Started

Data Collections

Scripts & Automation

Data Schema

Installation

Your First Data Collection

Running the Hustle Stats Script

Understanding the Data

Data Structure

Querying the Data

Running Other Data Collection Scripts

Defense Statistics

Player Shooting by Defender Distance

Passing & Playmaking

Dribble-Based Shooting

Working with Master Files

Directory Structure

Rate Limiting & Best Practices

Next Steps

Explore Data Sources

Data Schema Reference

Player Statistics

API Scripts

Build docs developers (and LLMs) love

Get Started

Data Collections

Scripts & Automation

Data Schema

​Installation

​Your First Data Collection

​Running the Hustle Stats Script

​Understanding the Data

​Data Structure

​Querying the Data

​Running Other Data Collection Scripts

​Defense Statistics

​Player Shooting by Defender Distance

​Passing & Playmaking

​Dribble-Based Shooting

​Working with Master Files

​Directory Structure

​Rate Limiting & Best Practices

​Next Steps

Explore Data Sources

Data Schema Reference

Player Statistics

API Scripts

Build docs developers (and LLMs) love

Installation

Your First Data Collection

Running the Hustle Stats Script

Understanding the Data

Data Structure

Querying the Data

Running Other Data Collection Scripts

Defense Statistics

Player Shooting by Defender Distance

Passing & Playmaking

Dribble-Based Shooting

Working with Master Files

Directory Structure

Rate Limiting & Best Practices

Next Steps