Database Initialization
The database is automatically initialized when you create aMatchStatistics instance.
Default Configuration
- Database file:
premier_league.db - Storage location:
./data/(created in your current working directory) - Database type: SQLite
- Initial leagues: Premier League, La Liga, Serie A, Bundesliga, Ligue 1, EFL Championship
Database Location
The database is stored in a directory relative to your current working directory:Custom Database Location
You can specify a custom location for your database:Database Schema
The database consists of four main tables:League Table
Stores information about football leagues:id- Primary keyname- League name (e.g., “Premier League”)up_to_date_season- Latest season with data (e.g., “2023-2024”)up_to_date_match_week- Latest match week scraped
Team Table
Stores team information:id- Unique team identifier (from FBRef)name- Team nameleague_id- Foreign key to league table
Game Table
Stores match results and metadata:id- Unique game identifierhome_team_id,away_team_id- Foreign keys to team tableleague_id- Foreign key to league tablehome_goals,away_goals- Match scorehome_team_points,away_team_points- Points before the matchdate- Match date and timematch_week- Week number in the seasonseason- Season (e.g., “2023-2024”)
GameStats Table
Stores detailed statistics for each team in a game (80+ metrics):id- Primary keygame_id- Foreign key to game tableteam_id- Foreign key to team table- Expected Goals:
xG,xA,xAG - Shooting:
shots_total_FW/MF/DF,shots_on_target_FW/MF/DF - Passing:
passes_completed_FW/MF/DF,pass_completion_percentage_FW/MF/DF,key_passes - Defense:
tackles_won_FW/MF/DF,blocks_FW/MF/DF,interceptions_FW/MF/DF - Possession:
possession_rate,touches_FW/MF/DF,carries_FW/MF/DF - Goalkeeping:
save_percentage,saves,PSxG - Discipline:
yellow_card,red_card,fouls_committed_FW/MF/DF
Statistics are split by position (FW=Forwards, MF=Midfielders, DF=Defenders, GK=Goalkeeper) to capture tactical nuances.
Updating the Database
Run the update method
Call This method:
update_data_set() to fetch new match data:- Determines the current season automatically
- Fetches all matches since the last update
- Scrapes detailed statistics for each new game
- Updates league tracking information
How Updates Work
-
Season Detection: Automatically determines the current season based on the current date
- If current month >= August: Current season =
{year}-{year+1} - Otherwise: Current season =
{year-1}-{year}
- If current month >= August: Current season =
-
Gap Identification: Compares
up_to_date_seasonfor each league with the current season - URL Generation: Creates URLs for all missing seasons and match weeks
- Duplicate Prevention: Filters out games already in the database by checking game IDs
-
Data Scraping: Fetches match details including:
- Team statistics by position
- Expected goals (xG)
- Passing, defensive, and possession metrics
- Goalkeeper statistics
-
League Update: Updates each league’s
up_to_date_seasonandup_to_date_match_week
Example: Regular Updates
Querying the Database
The library provides convenient methods to query your data:Get Total Game Count
Get Games by Season
Get Team Games
Get Historical Data
Database Maintenance
Checking Database Status
Backing Up Your Database
Resetting the Database
To start fresh, simply delete the database file:Best Practices
Use consistent database locations
Use consistent database locations
Keep your database in a dedicated directory and use environment variables for production:
Schedule regular updates
Schedule regular updates
Don’t wait until you need data. Schedule weekly or daily updates:
Monitor database size
Monitor database size
The database grows as you add more data. Monitor its size:
Use connection pooling for concurrent access
Use connection pooling for concurrent access
If accessing the database from multiple processes, consider using SQLAlchemy’s connection pooling:
Troubleshooting
”All Data is up to Date!” Message
If you see this message when runningupdate_data_set(), it means:
- Your database contains all available matches
- No new matches have been played since your last update
- The current season hasn’t started yet (if checking in summer)
Database Locked Errors
SQLite databases can only handle one write operation at a time:Missing Data After Update
If games are missing after an update:- Check if the season format is correct (“YYYY-YYYY” with regular hyphen)
- Verify the league name matches exactly (use
get_all_leagues()) - Ensure your internet connection is stable during the update
- Check the console for error messages during scraping
Advanced: Direct Database Access
For advanced queries, access the SQLAlchemy session directly:You now know how to initialize, manage, update, and query your Premier League database!