Skip to main content

Overview

Defensive metrics datasets track individual player defense including opponent shooting when guarded by the player, rim protection, defensive field goal percentage (DFG%), and defensive impact statistics. This data enables analysis of defensive effectiveness and shot contestation.

Data Files

Defense Master

Comprehensive defensive statistics
  • defense_master.csv (Regular Season)
  • defense_master_ps.csv (Playoffs)

Opponent FG%

Overall opponent shooting statistics
  • dfg.csv / dfg_p.csv

Rim Defense

Rim protection metrics (< 6 feet)
  • rimdfg.csv / rimdfg_p.csv

Rim Accuracy

Opponent accuracy at rim when contested
  • rim_acc.csv / rim_acc_p.csv
  • rimfreq.csv / rimfreq_p.csv

Schema: Defense Master

File: defense_master.csv / defense_master_ps.csv
Generated by: defense.py + organize_defense.py
Records: ~20,000+ player-season records
Source: Combined NBA.com and pbpstats.com data

Core Fields

nba_id
integer
required
NBA.com player ID
team_id
integer
NBA.com team ID
PLAYER
string
required
Player name
Team
string
Team abbreviation
year
integer
required
Season ending year (2014-2025)

Sample Data

nba_id,team_id,PLAYER,Team,overall_dfg%,all_dfga,all_dfgm,dif%,year,rim_dfg%,rim_dfga,rim_dfgm,rim_dif%
201599,1610612746,DeAndre Jordan,LAC,45.5,1375,626,-2.4,2014,56.4,621,350,-3.7
201577,1610612757,Robin Lopez,POR,43.4,1315,571,-4.4,2014,46.7,642,300,-12.7

Schema: Opponent Field Goal %

File: dfg.csv / dfg_p.csv
Generated by: defense.py
Source: NBA.com leaguedashptdefend endpoint
PLAYER_ID
integer
NBA.com player ID
PLAYER
string
Player name
TEAM_ID
integer
Team ID
TEAM
string
Team abbreviation
POSITION
string
Player position
AGE
integer
Player age
GP
integer
Games played
G
integer
Games started
FREQ%
float
Frequency of defensive matchups (0-100 scale)
DFGM
integer
Defensive field goals made (opponent)
DFGA
integer
Defensive field goal attempts (opponent)
DFG%
float
Defensive field goal percentage (opponent FG% when guarded)
FG%
float
Expected opponent FG% (league average for those shot types)
DIFF%
float
Difference between DFG% and expected FG% (negative = good defense)
year
integer
Season year

Schema: Rim Defense

File: rimdfg.csv / rimdfg_p.csv
Purpose: Tracks opponent shooting at the rim (< 6 feet) when contested by player
Fields identical to dfg.csv but filtered for rim attempts only.

Schema: Rim Accuracy & Frequency

Files:
  • rim_acc.csv / rim_acc_p.csv - Opponent accuracy at rim
  • rimfreq.csv / rimfreq_p.csv - Frequency of rim attempts allowed
Source: pbpstats.com API (on-off splits)
Name
string
Player name (lowercase)
Team
string
Team name (full)
Year
integer
Season ending year
Season
string
Season format: “2023-24”
AtRimAccuracyOpponent
float
Opponent shooting accuracy at rim when player on court
AtRimFrequencyOpponent
float
Frequency of opponent rim attempts when player on court

Understanding Defensive Metrics

Negative DIF% is Good Defense
  • DIFF% = -5.0 means opponents shoot 5% worse than expected
  • DIFF% = +3.0 means opponents shoot 3% better than expected (poor defense)

Key Concepts

The field goal percentage of shots taken when a specific player is the closest defender. This measures how well a player contests shots.Formula: DFG% = DFGM / DFGA * 100Lower DFG% = Better defense
Compares opponent’s actual shooting percentage to their expected FG% based on shot type and location.Formula: DIFF% = DFG% - Expected_FG%Negative differential = Player forces worse shooting than expected
Positive differential = Player allows better shooting than expected
Defensive metrics specifically for shots taken within 6 feet of the basket. Elite rim protectors have:
  • Low rim_dfg% (< 50%)
  • High rim_dfga (high volume of rim contests)
  • Negative rim_dif% (force misses)

Usage Examples

Find Elite Rim Protectors

import pandas as pd

df = pd.read_csv('defense_master.csv')

# 2024 season, minimum 200 rim contests
elite_rim = df[
    (df['year'] == 2024) & 
    (df['rim_dfga'] >= 200)
].sort_values('rim_dif%')  # Most negative = best

print("Top 20 Rim Protectors (2024):")
print(elite_rim[['PLAYER', 'Team', 'rim_dfga', 'rim_dfg%', 'rim_dif%']].head(20))

Overall Defensive Impact

import pandas as pd

df = pd.read_csv('dfg.csv')

# 2024 season, high volume defenders
impact = df[
    (df['year'] == 2024) & 
    (df['DFGA'] >= 500)  # Min 500 contests
].sort_values('DIFF%')

print("Best Defensive Impact (2024):")
print(impact[['PLAYER', 'TEAM', 'DFGA', 'DFG%', 'FG%', 'DIFF%']].head(20))

Rim Protection vs Perimeter Defense

import pandas as pd

df = pd.read_csv('defense_master.csv')

# Compare rim defense to overall defense
df_2024 = df[(df['year'] == 2024) & (df['rim_dfga'] >= 150)].copy()

# Calculate perimeter defense (non-rim)
df_2024['perimeter_dfga'] = df_2024['all_dfga'] - df_2024['rim_dfga']
df_2024['perimeter_dfgm'] = df_2024['all_dfgm'] - df_2024['rim_dfgm']
df_2024['perimeter_dfg%'] = (df_2024['perimeter_dfgm'] / df_2024['perimeter_dfga'] * 100).round(1)

print("Rim vs Perimeter Defense:")
print(df_2024[['PLAYER', 'rim_dfg%', 'perimeter_dfg%', 'overall_dfg%']].head(20))

Track Individual Matchup Performance

import pandas as pd

df = pd.read_csv('dfg.csv')

# Look at specific player's defensive evolution
player_name = 'Anthony Davis'
player_defense = df[df['PLAYER'] == player_name].sort_values('year')

print(f"Defensive Evolution: {player_name}")
for _, row in player_defense.iterrows():
    print(f"{row['year']}: {row['DFGA']} contests, {row['DFG%']:.1f}% DFG, {row['DIFF%']:+.1f} DIFF")

On-Court Rim Defense (Team Context)

import pandas as pd

rim_acc = pd.read_csv('rim_acc.csv')
rim_freq = pd.read_csv('rimfreq.csv')

# Merge rim accuracy and frequency
merged = rim_acc.merge(
    rim_freq[['Name', 'Year', 'AtRimFrequencyOpponent']], 
    on=['Name', 'Year']
)

# 2024 data
merged_2024 = merged[merged['Year'] == 2024]

print("Rim Defense Impact (2024):")
print(merged_2024[['Name', 'Team', 'AtRimAccuracyOpponent', 'AtRimFrequencyOpponent']].head(20))

Defensive Position Analysis

import pandas as pd

df = pd.read_csv('dfg.csv')

# Compare defensive metrics by position
by_position = df[df['year'] == 2024].groupby('POSITION').agg({
    'DFG%': 'mean',
    'DIFF%': 'mean',
    'DFGA': 'mean'
}).round(1)

print("Defensive Metrics by Position (2024):")
print(by_position)

Data Collection Scripts

defense.py

Collects DFG% data from NBA.com and pbpstats.com

organize_defense.py

Combines defensive datasets into defense_master.csv

Important Notes

  • DIFF% is the key metric: Measures defensive impact relative to expected performance
  • Volume matters: High DFGA indicates a player who defends frequently (usually starters)
  • Context: Big men typically have lower DFG% due to rim protection, guards contest more perimeter shots
Defensive metrics don’t capture help defense, rotations, or team schemes. They only measure the closest defender at time of shot.
Combine defensive metrics with tracking data (deflections, steals, charges) from hustle.csv for comprehensive defensive analysis.

Build docs developers (and LLMs) love