Data Overview
The F1 ML Prediction System is built on 7 years of historical Formula 1 data (2018-2024) collected via the FastF1 API.Lap Times
139,135 lapsComplete lap-by-lap telemetry data
Race Results
2,537 resultsFinishing positions and points
Pit Stops
4,512 stopsPit stop timing and duration
Weather Records
127 recordsRace day weather conditions
Data Sources
Primary Data Source: FastF1 API
FastF1 is the official Python library for accessing Formula 1 timing data, telemetry, and session information.
- Documentation: https://docs.fastf1.dev/
- Coverage: 2018-present (live updates)
- Data Quality: Official FIA timing data
- Race results and qualifying positions
- Lap-by-lap timing data
- Pit stop information
- Weather conditions per session
- Driver and team metadata
source/src/data/f1_data_collector.py
Dataset Files
All data files are stored in CSV format for easy processing and analysis.Raw Data Files
Location:data/raw/
- race_results.csv
- lap_times.csv
- pit_stops.csv
- weather.csv
2,537 records - Race finishing positions and championship pointsColumns:Sample Data:Key Statistics:
- Races: 127 (7 years × ~18 races/year)
- Drivers: 40+ unique drivers
- Teams: 12 teams across the period
- DNF Rate: ~12% of all results
Processed Data Files
Location:data/processed/
Engineered features ready for machine learning:
- race_features.csv
- tire_features.csv
- pit_features.csv
Main feature dataset for winner prediction modelCreated by:
source/feature_engineering.pyFeatures: 21 columns including:- Driver historical stats (wins, podiums, avg position)
- Team performance metrics
- Circuit-specific experience
- Weather conditions
- Grid position advantages
Data Collection Process
Setup FastF1 Cache
Configure local cache directory for faster data access:Location:
data/f1_cache/ (automatically created)Fetch Race Sessions
Collect data for each race from 2018-2024:Duration: 2-4 hours depending on internet speed
Extract & Transform
Process raw session data into structured CSV files:
- Race results from
session.results - Lap times from
session.laps - Pit stops from
session.laps.pick_pit_stops() - Weather from
session.weather_data
Data Statistics
Dataset Completeness
Race Results - 100% Complete
Race Results - 100% Complete
All 127 races from 2018-2024 have complete results:
- Total Results: 2,537
- Finished Races: 2,233 (88%)
- DNFs: 304 (12%)
- Missing Data: 0%
Lap Times - 98% Complete
Lap Times - 98% Complete
139,135 laps recorded with minor gaps:
- Complete Laps: 136,357 (98%)
- Missing Sector Times: 2,778 (2%)
- Reasons: Data transmission issues, red flags
Pit Stops - 95% Complete
Pit Stops - 95% Complete
4,512 pit stops with some duration data missing:
- Complete Records: 4,286 (95%)
- Missing Duration: 226 (5%)
- Reasons: Penalty stops, drive-through penalties
Weather - 90% Complete
Weather - 90% Complete
127 race weather records with some sensor gaps:
- Complete Weather: 114 races (90%)
- Partial Data: 13 races (10%)
- Missing Fields: Usually wind speed or pressure
Top Drivers (2018-2024)
Max Verstappen
52 wins | 78 podiumsDominant 2021-2024 era
Lewis Hamilton
36 wins | 71 podiumsMercedes dynasty 2018-2021
Charles Leclerc
6 wins | 32 podiumsFerrari’s lead driver
Top Teams (2018-2024)
- Red Bull Racing
- Mercedes
- Ferrari
- Wins: 78
- Podiums: 156
- Championships: 3 (2021, 2022, 2023)
- Average Position: 2.1
Data Quality & Validation
- Validation Checks
- Known Issues
- Data Cleaning
Automated data quality checks:✅ No duplicate records - Each driver × race combination is unique✅ Valid ranges - Lap times between 0-300 seconds✅ Position consistency - Finishing positions 1-20 per race✅ Points validation - Points match FIA rules (25-18-15-12-10-8-6-4-2-1)✅ Team continuity - Team names normalized across seasons
Tire Compound Statistics
- Soft Compound
- Medium Compound
- Hard Compound
Usage: 35% of stintsDegradation: +0.08 sec/lap averageOptimal Stint: 15-20 lapsBest Circuits: Monaco, Singapore, Hungary
Updating the Dataset
Recollect Recent Data
Update with latest 2024/2025 races:This incremental update is much faster than full collection.
API Endpoints for Data
The Flask app serves data via REST API:source/src/app.py or source/app.py
Additional Resources
FastF1 Documentation
Official API documentation and examples
FIA Official Data
Formula 1 official timing and results
Data Collector Script
source/src/data/f1_data_collector.pyFeature Engineering
source/feature_engineering.py