Overview
The F1 ML Prediction System uses ensemble machine learning models to predict race winners with high accuracy. The system combines Random Forest and XGBoost classifiers for robust predictions.Model Accuracy
Training Accuracy
85-90%Performance on historical training data (2018-2023)
Test Accuracy
75-80%Real-world validation on held-out 2024 data
Top-3 Accuracy
80-85%Podium prediction accuracy (includes Top 3 finishers)
The model achieves 85.9% accuracy on the enhanced V2 version with weather, tire, and circuit factors included.
Performance Metrics by Model
Random Forest Classifier
Configuration & Hyperparameters
Configuration & Hyperparameters
The Random Forest model uses the following configuration:Location:
source/winner_predictor.py:98-109Performance Metrics
Performance Metrics
Accuracy: ~85% (training), ~78% (test)Precision/Recall:
- Top-3 Finish: 0.82 precision, 0.79 recall
- Outside Top-3: 0.91 precision, 0.93 recall
XGBoost Classifier
Configuration & Hyperparameters
Configuration & Hyperparameters
source/winner_predictor.py:122-133Performance Metrics
Performance Metrics
Accuracy: ~87% (training), ~76% (test)Precision/Recall:
- Top-3 Finish: 0.80 precision, 0.81 recall
- Outside Top-3: 0.92 precision, 0.91 recall
Ensemble Model
The final prediction uses an average ensemble of Random Forest and XGBoost:Feature Importance Analysis
Top 10 Predictive Features
These features have the highest impact on race winner predictions:GridPosition (35% importance)
Most important factor! Starting position directly correlates with race outcomes.
- Pole position converts to wins ~40% of the time
- Top 3 grid positions account for 65% of race wins
source/feature_engineering.pyDriver_TotalWins (18% importance)
Historical win count indicates driver skill and experience.
- Verstappen: 50+ wins
- Hamilton: 103 wins
- Past success predicts future performance
Team_AvgPosition (12% importance)
Team/car performance is crucial for competitive results.
- Red Bull Racing: Avg position 2.3
- Ferrari: Avg position 3.8
- Mercedes: Avg position 4.1
Driver_Last5_AvgPoints (10% importance)
Recent form matters - drivers in good form perform better.Tracks rolling 5-race average of championship points.
Complete Feature List
The model uses 21 features across 4 categories:- Driver Features (12)
- Team Features (5)
- Weather Features (4)
- V2 Enhanced Features
Driver_AvgPosition- Career average finishing positionDriver_AvgPoints- Average points per raceDriver_TotalWins- Total career winsDriver_TotalPodiums- Total podium finishesDriver_DNFRate- Did Not Finish percentageDriver_Last5_AvgPosition- Recent 5-race average positionDriver_Last5_AvgPoints- Recent 5-race pointsDriver_CircuitExperience- Races at this circuitDriver_CircuitAvgPosition- Average position at circuitDriver_AvgGridPosition- Average starting positionDriver_GridGain- Average positions gained from gridGridPosition- Current race starting position
Evaluation Visualizations
The training process generates detailed evaluation charts:Confusion Matrices
Location:models/confusion_matrices.png
Shows prediction accuracy for both Random Forest and XGBoost models:
- True Positives: Correctly predicted Top-3
- True Negatives: Correctly predicted outside Top-3
- False Positives: Incorrectly predicted Top-3
- False Negatives: Missed Top-3 predictions
Feature Importance Charts
Location:models/feature_importance.png
Horizontal bar charts showing:
- Top 10 features by importance score
- Comparison between RF and XGBoost feature rankings
- Relative contribution percentages
Model Performance by Conditions
Weather Impact on Accuracy
- Dry Conditions
- Light Rain
- Heavy Rain
Accuracy: 82%Highest accuracy in dry races where grid position dominates:
- Pole position win rate: 42%
- Top 3 grid → 65% podium rate
- Predictable tire strategies
Key Performance Insights
Grid Position Dominance
Pole position accounts for 35% of model importance - where you start matters most!
Rain Equalizer
Wet races increase prediction uncertainty by 15-20% but create opportunities for underdogs.
Team vs Driver
Team performance (12%) + Driver skill (18%) = 30% combined importance. Both matter significantly.
Recent Form
Last 5 races account for 10% importance - momentum and confidence are real factors.
Limitations & Future Improvements
Current Limitations
Current Limitations
- Limited to 2018-2024 data - Only 7 years of historical races (~140 events)
- No qualifying data - Grid position used instead of qualifying times
- Basic weather modeling - Binary rain indicator rather than detailed conditions
- No safety car events - Race interruptions not modeled
- Static tire strategy - Rule-based rather than ML-predicted
Planned Enhancements
Planned Enhancements
- Add qualifying telemetry - Sector times, speed traps, mini-sector analysis
- LSTM for tire degradation - Time-series modeling of compound performance
- Neural networks - Deep learning for complex feature interactions
- Safety car prediction - Probability model for race interruptions
- Real-time updates - Live race prediction updates during sessions
Model Files & Locations
- Trained Models
- Source Code
- Visualizations
Saved model files in
models/saved_models/:winner_predictor_rf.pkl- Random Forest model (V1)winner_predictor_xgb.pkl- XGBoost model (V1)winner_predictor_v2.pkl- Enhanced ensemble model (V2)feature_columns.pkl- Feature list for V1feature_columns_v2.pkl- Feature list for V2