Overview
TheMatchStatistics class provides comprehensive access to match data from top European football leagues. It handles scraping game statistics, team performance metrics, and enables dataset creation for machine learning purposes.
The
MatchStatistics module stores data in a local SQLite database and can export datasets with customizable lag windows and weighting strategies for time-series analysis.Installation
Initialization
Parameters
Name of the SQLite database file to store match data
Directory path where the database will be stored
Core Methods
update_data_set()
Updates the database with the latest match data from all supported leagues. This method automatically detects new matches and scrapes their detailed statistics.- Determines current season based on system date
- Constructs URLs for matches needing updates
- Scrapes new match data and player statistics
- Updates league information with latest season and match week
create_dataset()
Creates a CSV file containing game statistics optimized for machine learning training. Supports lag-based feature engineering and custom weighting strategies.Parameters
File path where the CSV dataset will be saved
Maximum number of rows to include. If specified, returns the last n rows after sorting by date
Number of previous games to use for calculating team statistics. For example, lag=10 means current row uses the team’s past 10-game average
Weighting strategy for historical games:
"lin": Linear weights (more recent games weighted higher)"exp": Exponential weights (requires params argument)None: Equal weights for all games
Parameter for exponential weighting strategy. Required when weights=“exp”
Examples
- Basic Dataset
- Exponential Weighting
- Linear Weighting
The dataset includes home and away team statistics with features like xG, shots, passes, tackles, possession, and 50+ other metrics grouped by player position (FW, MF, DF, GK).
get_team_games()
Retrieve all games for a specific team across all seasons.Parameters
Name of the team to retrieve games for
Returns
List[dict]: List of game dictionaries with full relationship data (home_team, away_team, game_stats)
Raises
ValueError: If no team is found with the specified name
get_games_by_season()
Retrieve all games for a specific season and match week.Parameters
Season in format “YYYY-YYYY” (e.g., “2023-2024”)
Match week number to filter games
Returns
List[dict]: List of games with full relationship data
Raises
ValueError: If season format is invalid (must be “YYYY-YYYY”)ValueError: If no games found for the specified season and match week
get_games_before_date()
Retrieve games before a specific date, optionally filtered by team.Parameters
Reference date to search before
Maximum number of games to return
Team name to filter games. If None, returns games from all teams
Returns
List[dict]: List of games ordered by date descending
get_future_match()
Retrieve the next upcoming match for a specific league or team.Parameters
League name (e.g., “Premier League”, “La Liga”, “Serie A”)
Optional team name to filter for that team’s next match
Returns
Dict or str: Dictionary with home_team and away_team objects, or a message string if season is finished
get_all_leagues()
Get all available leagues in the database.Returns
List[str]: List of all league names
get_all_teams()
Get all teams across all leagues in the database.Returns
List[str]: List of all team names
get_total_game_count()
Get the total number of games stored in the database.Returns
int: Total number of games
Supported Leagues
The module supports the following leagues:Premier League
Premier League
English top-flight football league
- Available from: 2018-2019 season onwards
- Teams: 20
- Match weeks: 38
La Liga
La Liga
Spanish top-flight football league
- Available from: 2018-2019 season onwards
- Teams: 20
- Match weeks: 38
Serie A
Serie A
Italian top-flight football league
- Available from: 2018-2019 season onwards
- Teams: 20
- Match weeks: 38
Bundesliga
Bundesliga
German top-flight football league (Fußball-Bundesliga)
- Available from: 2018-2019 season onwards
- Teams: 18
- Match weeks: 34
Ligue 1
Ligue 1
French top-flight football league
- Available from: 2018-2019 season onwards
- Teams: 18
- Match weeks: 34
Complete Example
Data Structure
The CSV dataset includes the following columns:- Game Information
- Team Statistics
- Target Variables
game_id: Unique game identifierdate: Match date and timeseason: Season (e.g., “2023-2024”)match_week: Week numberhome_team,away_team: Team nameshome_team_id,away_team_id: Team IDs