Overview
Feature engineering transforms raw race data into meaningful predictive features. The system uses time-series feature engineering to create historical statistics without data leakage.20+ Features
Driver, team, circuit, weather, and tire features
Time-Aware
Features use only historical data before each race
Zero Leakage
Strict time-based splitting prevents future data usage
Auto-Scaling
Features automatically adapt to new seasons
Feature Engineering Pipeline
Version 1: Basic Features
File:feature_engineering.py
Creates foundational driver and team features from historical performance.
Version 2: Enhanced Features
File:feature_engineering_v2.py
Adds weather, tire strategy, and circuit-specific features for improved accuracy.
Feature Categories
1. Driver Performance Features
Historical performance metrics for each driver:Driver Feature Details
Driver Feature Details
Driver_AvgPosition
- Average finishing position across all previous races
- Lower is better (1.0 = always wins)
- Default: 10.0 for rookies
- Average points per race
- Includes zero-point finishes
- Indicates consistency
- Career wins before this race
- Strong predictor of future wins
- Champions typically have 20+ wins
- Career top-3 finishes
- More stable than wins alone
- Good indicator of peak performance
- Percentage of races with DNF (Did Not Finish)
- Indicates reliability/consistency
- Lower is better
2. Recent Form Features
Captured driver momentum using rolling windows:Recent form (last 5 races) is often more predictive than career statistics, especially after mid-season car upgrades.
3. Circuit-Specific Features
Driver performance at specific circuits:- Monaco specialists (e.g., Alonso) outperform expectations
- Monza favorites (e.g., McLaren) have circuit-specific advantages
- Street circuits reward experience
4. Grid Position Features
Grid Gain Explained
Grid Gain Explained
Grid Gain = Average starting position - Average finishing position
- Positive Grid Gain: Driver typically gains positions (overtaker)
- Negative Grid Gain: Driver loses positions (qualifier but not racer)
- Zero Grid Gain: Maintains grid position
- Starts P10 on average, finishes P6 on average → Grid Gain = +4
- Indicates strong race pace and overtaking ability
5. Team Performance Features
6. Weather Features (V2)
Enhanced model includes weather-aware features:DRY
Impact: 1.0xGrid position dominant
Pole wins ~42%
LIGHT_RAIN
Impact: 1.05xSkill matters more
Pole wins ~35%
HEAVY_RAIN
Impact: 1.15xHigh chaos factor
Pole wins ~23%
7. Tire Strategy Features (V2)
Tire compound characteristics:8. Circuit Type Features (V2)
Feature Creation Process
Time-Series Feature Engineering
The critical innovation is preventing data leakage:Critical: Features for race N use only data from races 1 to N-1. This prevents the model from “seeing the future” during training.
Missing Value Handling
Feature Importance
From trained models, top features by importance:Top 10 Most Important Features
| Rank | Feature | Importance | Description |
|---|---|---|---|
| 1 | GridPosition | 0.2847 | Starting grid position |
| 2 | Driver_AvgPosition | 0.1523 | Historical average finish |
| 3 | Driver_TotalWins | 0.0892 | Career wins |
| 4 | Team_AvgPosition | 0.0745 | Team performance |
| 5 | Driver_Last5_AvgPosition | 0.0634 | Recent form |
| 6 | Driver_CircuitAvgPosition | 0.0521 | Circuit-specific performance |
| 7 | Weather_Impact | 0.0487 | Weather multiplier |
| 8 | Tire_Degradation_Rate | 0.0412 | Tire compound effect |
| 9 | Driver_AvgPoints | 0.0398 | Average points per race |
| 10 | Is_Wet_Race | 0.0367 | Rain flag |
Feature Groups by Importance
- High Impact (>10%)
- Medium Impact (5-10%)
- Low Impact (<5%)
- GridPosition: 28.5%
- Driver_AvgPosition: 15.2%
Feature Validation
Quality Checks
Feature Statistics
Output Files
V1 Features:data/processed/race_features.csv
- Basic features (driver, team, grid)
- ~880 records (2023-2024 seasons)
- ~15 feature columns
data/processed/race_features_v2.csv
- Enhanced features (weather, tires, circuits)
- Same record count
- ~30+ feature columns
Running Feature Engineering
Basic Version
Enhanced Version (V2)
Expected Output
Next Steps
After feature engineering:- Validate Features → Check distributions and correlations
- Train Models → Use processed features for ML training (see Models)
- Feature Selection → Optionally remove low-importance features