Collection Overview
The NBA Statistics Data Platform employs an automated data collection architecture organized by season years. Scripts are scheduled to run daily, fetching data from multiple sources and organizing it into year-based directories.Directory Structure
Data is organized hierarchically by season:- defense/: Defensive metrics (rim protection, DFG%, frequency stats)
- player_shooting/: Shot quality data by defender distance
- tracking/: NBA tracking stats (drives, touches, passes, catch & shoot)
- playoffs/: Separate playoff versions of all datasets
Automation Workflow
The entire collection process is orchestrated throughscript.sh, which runs sequentially:
Daily Execution Script
Script Conversion
Converts any Jupyter notebooks (
.ipynb) to executable Python scripts using nbconvertData Collection
Executes individual collection scripts in sequence:
- Player shooting metrics by defender distance
- Team shooting statistics
- Tracking data (drives, touches, passes)
- Hustle stats (deflections, loose balls, charges)
- Defense metrics (rim protection, DFG%)
- Salary and contract information
Index Generation
Runs
make_index2.py to create master player index with ID mappings (Basketball Reference → NBA.com)Data Distribution
After collection, processed CSV files are distributed to consumer applications:Distribution Targets
Web Application
Main web interface consuming all datasets for interactive visualizations
Discord Bot
Subset of datasets for quick stat lookups via Discord commands
Player Sheets
Specialized lineups and advanced metrics module
Script Organization
Scripts are categorized by data type:| Category | Scripts | Output Files |
|---|---|---|
| Player Shooting | player_shooting.py, scrape_shooting.py, dribble.py | player_shooting.csv, dribbleshot.csv, shotzone.csv |
| Defense | defense.py, organize_defense.py | dfg.csv, rimdfg.csv, rim_acc.csv, rimfreq.csv |
| Tracking | new_tracking.py, passing.py, hustle.py | tracking.csv, passing.csv, hustle.csv |
| Team Stats | team_shooting.py, team_average_scrape.py | team_shooting.csv, team_avg.csv |
| Salary Data | salary_scrape.py, salary2.py | nba_salaries.csv, salary_spread.csv |
| Player Index | make_index.py, make_index2.py | index_master.csv |
Collection Frequency
- Regular Season: Daily updates (October - April)
- Playoffs: Daily updates with
ps=Trueparameter (April - June) - Offseason: Weekly updates for contract/roster changes (July - September)
Season Configuration
Most scripts use a year-based loop configuration:ps (playoffs) parameter toggles between regular season and playoff data:
Error Handling
Scripts implement basic retry logic and data validation:Next Steps
Scraping Pipeline
Deep dive into web scraping mechanics and API integrations
Data Processing
Learn how raw data is transformed into analysis-ready datasets