Overview
The Data Collector module uses the FastF1 API to fetch historical Formula 1 race data, including race results, driver information, and grid positions. It processes data from multiple seasons and exports it to CSV format for feature engineering and model training. Source File:collect_working.py
FastF1 Integration
Cache Configuration
The collector uses FastF1’s caching mechanism to improve performance and reduce API calls:Caching is essential for performance. The cache directory stores downloaded session data to avoid repeated API calls.
Data Collection Pipeline
Main Collection Loop
The collector iterates through years and race events to gather comprehensive race data:Core Functions
get_event_schedule()
The Formula 1 season year (e.g., 2023, 2024)
RoundNumber: Race round in the seasonEventName: Name of the Grand PrixEventFormat: Format type (‘conventional’, ‘testing’, ‘sprint’)
get_session()
The Formula 1 season year
The round number from the event schedule
Session identifier: ‘R’ (Race), ‘Q’ (Qualifying), ‘FP1’, ‘FP2’, ‘FP3’
get_driver()
Three-letter driver code (e.g., ‘VER’, ‘HAM’, ‘LEC’)
- Driver name
- Team affiliation
- Driver number
Data Extraction
Race Results Extraction
The collector extracts detailed race results for each driver:Results DataFrame Fields
Final race finishing position (1-20)
Starting grid position (1-20)
Championship points earned (25 for win, 18 for 2nd, etc.)
Constructor/team name (e.g., ‘Red Bull Racing’, ‘Mercedes’)
Driver’s full name
Three-letter driver code
Data Structure
Output Record Format
Each race result is stored as a dictionary with the following structure:Season year (2023, 2024)
Race round number in the season (1-24)
Full Grand Prix name
Three-letter driver abbreviation
Driver’s complete name
Team/constructor name
Final classification position
Starting position on grid (default: 20 if missing)
Points scored (default: 0.0 if none)
CSV Output Format
race_results_WORKING.csv
The collected data is exported to a CSV file with the following structure:The CSV is saved to
./data/raw/race_results_WORKING.csv and contains results from all processed races.Error Handling
Event-Level Error Handling
The collector implements robust error handling for individual race events:Failed events are skipped, and the collector continues with remaining races to maximize data collection.
Data Validation
The collector validates data before saving:Usage Example
Data Sources
FastF1 API
The collector relies on the FastF1 library which fetches data from:- Official F1 Timing Data: Live timing and telemetry
- Ergast API: Historical race results and standings
- FIA Documents: Official race classifications
Supported Data Types
- Race results and classifications
- Grid positions and qualifying results
- Championship points allocations
- Driver and team information
- Lap-by-lap timing data
Performance Considerations
Cache Usage: Always enable caching to avoid redundant downloads. The cache significantly speeds up repeated data access.
API Rate Limits: The FastF1 library handles rate limiting automatically, but collecting large datasets may take several minutes.
Network Requirements: Active internet connection required for initial data fetch. Cached data can be accessed offline.