Troubleshooting

Common Issues

Quick solutions to the most frequently encountered problems.

Module not found errors

Symptom:

ModuleNotFoundError: No module named 'fastf1'
ModuleNotFoundError: No module named 'sklearn'

Cause: Python packages not installed or wrong environment activatedSolution:

Verify virtual environment is activated

# Check if venv is active (you should see (venv) in prompt)

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate

Install all dependencies

pip install -r requirements.txt

Verify installation

pip list | grep fastf1
pip list | grep scikit-learn
pip list | grep xgboost

If you’re still getting errors, try reinstalling with:

pip install --upgrade --force-reinstall -r requirements.txt

Models not found when starting Flask

Symptom:

FileNotFoundError: [Errno 2] No such file or directory: './models/saved_models/winner_predictor_rf.pkl'

Cause: Models haven’t been trained yetSolution:

Check if data exists

ls data/raw/
# Should see: race_results.csv, lap_times.csv, pit_stops.csv, weather.csv

If files are missing, collect data first:

python src/data/f1_data_collector.py

Train the models

python train_all_models.py

This will:

Run feature engineering
Train Random Forest model
Train XGBoost model
Save models to models/saved_models/

Expected duration: 5-10 minutes

Verify model files exist

ls models/saved_models/
# Should see:
# winner_predictor_rf.pkl
# winner_predictor_xgb.pkl
# feature_columns.pkl

Low model accuracy

Symptom: Model test accuracy below 70%Possible Causes:

Insufficient training data
Poor feature engineering
Overfitting or underfitting
Data quality issues

Solutions:

Collect More Data
Add More Features
Try Different Models
Hyperparameter Tuning

Add more recent seasons to training data:

# Update data collection to include 2025
python src/data/f1_data_collector.py --years 2018-2025

More data → Better generalization

Enhance feature engineering:

# In feature_engineering.py, add:

# Qualifying pace
features['QualiPace'] = df['QualifyingTime'] / df['BestQualifyingTime']

# Teammate comparison
features['TeammateGap'] = df['Position'] - df['TeammatePosition']

# Track specialization
features['CircuitWinRate'] = df.groupby(['Driver', 'Circuit'])['IsWinner'].transform('mean')

Experiment with other algorithms:

# Neural Network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, activation='relu', input_shape=(n_features,)),
    Dropout(0.3),
    Dense(64, activation='relu'),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

# LightGBM
from lightgbm import LGBMClassifier
model = LGBMClassifier(n_estimators=200, learning_rate=0.05)

Use GridSearchCV to optimize:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15, 20],
    'min_samples_split': [5, 10, 20]
}

grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,
    scoring='accuracy'
)

grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_

Flask won't start - port already in use

Symptom:

OSError: [Errno 48] Address already in use

Cause: Another process is using port 5000Solution:

Kill Existing Process
Use Different Port

Mac/Linux:

# Find process using port 5000
lsof -i :5000

# Kill the process (replace PID with actual process ID)
kill -9 <PID>

Windows:

# Find process
netstat -ano | findstr :5000

# Kill process (replace PID with actual ID)
taskkill /PID <PID> /F

Modify the Flask app to use a different port:

# In app.py, change the last line:
if __name__ == '__main__':
    app.run(debug=True, port=8080)  # Use port 8080 instead

Or use command line argument:

python src/app.py --port 8080

FastF1 data collection timeout

Symptom:

TimeoutError: Failed to fetch session data
ConnectionError: Remote end closed connection

Cause: Internet connection issues or FastF1 API rate limitingSolution:

Check internet connection

Ensure stable connection and try again

Add retry logic

import time
from requests.exceptions import ConnectionError

def fetch_session_with_retry(year, round, session_type, max_retries=3):
    for attempt in range(max_retries):
        try:
            session = fastf1.get_session(year, round, session_type)
            session.load()
            return session
        except (ConnectionError, TimeoutError) as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(5)  # Wait 5 seconds before retry
    raise Exception(f"Failed after {max_retries} attempts")

Enable caching

import fastf1
fastf1.Cache.enable_cache('data/f1_cache/')  # Cache successful requests

Cached data will be reused on subsequent runs.

FastF1 API sometimes has rate limits. If you encounter frequent timeouts, add delays between requests:

import time
for year in range(2018, 2025):
    for round in range(1, 25):
        # ... fetch data ...
        time.sleep(2)  # Wait 2 seconds between requests

Data Issues

Missing or incomplete data

Symptom: CSV files have fewer rows than expected or missing columnsDebugging:

import pandas as pd

# Check data completeness
df = pd.read_csv('data/raw/race_results.csv')

print(f"Total rows: {len(df)}")
print(f"Missing values:\n{df.isnull().sum()}")
print(f"Date range: {df['Year'].min()} - {df['Year'].max()}")

Solutions:

Re-collect Data
Fill Missing Values

# Delete existing data and re-collect
rm data/raw/*.csv
python src/data/f1_data_collector.py

# Numerical: fill with median
df['AvgAirTemp'].fillna(df['AvgAirTemp'].median(), inplace=True)

# Categorical: fill with mode
df['Team'].fillna(df['Team'].mode()[0], inplace=True)

# Forward fill for time series
df['LapTime'].fillna(method='ffill', inplace=True)

Data type errors

Symptom:

TypeError: unsupported operand type(s) for +: 'str' and 'float'

Cause: Columns are wrong data type (string instead of numeric)Solution:

# Convert to correct types
df['GridPosition'] = pd.to_numeric(df['GridPosition'], errors='coerce')
df['Points'] = df['Points'].astype(float)
df['Year'] = df['Year'].astype(int)
df['IsRaining'] = df['IsRaining'].astype(bool)

# Check types
print(df.dtypes)

Duplicate rows in dataset

Symptom: More rows than expected, duplicate driver-race combinationsCheck for duplicates:

# Find duplicates
duplicates = df[df.duplicated(['Year', 'Round', 'Driver'], keep=False)]
print(f"Found {len(duplicates)} duplicate rows")

# Remove duplicates (keep first occurrence)
df = df.drop_duplicates(['Year', 'Round', 'Driver'], keep='first')

Performance Issues

Model training is too slow

Symptom: Training takes 30+ minutesSolutions:

Reduce Data Size
Optimize Hyperparameters
Feature Selection

# Sample training data
df_sample = df.sample(frac=0.5, random_state=42)  # Use 50% of data

# Or use only recent years
df_recent = df[df['Year'] >= 2020]

# Reduce model complexity
RandomForestClassifier(
    n_estimators=50,      # Fewer trees (was 100)
    max_depth=8,          # Shallower trees (was 10)
    n_jobs=-1             # Use all CPU cores
)

from sklearn.feature_selection import SelectKBest, f_classif

# Keep only top 10 features
selector = SelectKBest(f_classif, k=10)
X_selected = selector.fit_transform(X_train, y_train)

# Get selected feature names
selected_features = X_train.columns[selector.get_support()]

Web dashboard is slow to load

Symptom: Dashboard takes 10+ seconds to render predictionsSolutions:

Cache model predictions:

from functools import lru_cache

@lru_cache(maxsize=100)
def predict_cached(grid, weather, tire):
    return model.predict(...)

Optimize data loading:

# Load only necessary columns
df = pd.read_csv('data.csv', usecols=['Year', 'Driver', 'Position'])

# Use more efficient data types
df['Year'] = df['Year'].astype('int16')  # Instead of int64

Reduce plot complexity:

# Limit to top 10 drivers in charts
top_drivers = df.nlargest(10, 'Points')

Debugging Tips

Enable Debug Mode
Check Data at Each Step
Inspect Model Predictions
Validate Feature Importance

Get detailed error messages:

# In app.py
if __name__ == '__main__':
    app.run(debug=True)  # Shows stack traces

For training scripts:

import logging
logging.basicConfig(level=logging.DEBUG)

Add checkpoints to validate data:

# After loading data
print(f"Loaded {len(df)} rows")
print(df.head())
print(df.info())

# After feature engineering
print(f"Created {len(features.columns)} features")
print(features.describe())

# After train/test split
print(f"Train: {len(X_train)}, Test: {len(X_test)}")

Understand what the model is predicting:

# Get prediction probabilities
y_pred_proba = model.predict_proba(X_test)

# Show predictions vs actual
results = pd.DataFrame({
    'Actual': y_test,
    'Predicted': y_pred,
    'Probability': y_pred_proba[:, 1]
})

# Find confident wrong predictions
wrong = results[(results['Actual'] != results['Predicted']) & 
                (results['Probability'] > 0.8)]
print(wrong)

Ensure features make sense:

# Get feature importance
importance = pd.DataFrame({
    'feature': feature_names,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print(importance.head(10))

# Check for unexpected importance
# GridPosition should be high
# Random noise features should be low

Frequently Asked Questions

Why is accuracy lower on recent races?

Answer: Models are trained on historical data (2018-2023). Recent regulatory changes, new drivers, or team performance shifts may not be fully captured.Solution: Retrain models with latest data:

python src/data/f1_data_collector.py  # Get 2024/2025 data
python train_all_models.py            # Retrain

Can I use this for betting?

Answer: While the model achieves 75-80% accuracy, it’s designed for educational purposes. Betting involves risk beyond prediction accuracy.

Disclaimer: This is an educational project. Do not use predictions for gambling or financial decisions. Past performance does not guarantee future results.

How do I add 2025/2026 data?

Answer: The FastF1 API updates automatically with new race data.

# In f1_data_collector.py, update year range:
for year in range(2018, 2026):  # Include 2025 races
    # ... collection code ...

Then retrain models with updated data.

Can I predict specific drivers?

Answer: Yes! The web dashboard allows driver-specific predictions, or use the API:

from src.models.winner_predictor import WinnerPredictor

predictor = WinnerPredictor()
predictor.load_models('models/saved_models/')

# Predict for Verstappen starting P1 in dry conditions
features = {
    'GridPosition': 1,
    'Driver_TotalWins': 52,
    'Team_AvgPosition': 1.8,
    'Weather': 'DRY'
    # ... other features ...
}

result = predictor.predict_race(pd.DataFrame([features]))
print(f"Win probability: {result['ensemble_probability']:.2%}")

What Python version is required?

Answer: Python 3.8 or higher is required.

# Check your version
python --version

# If too old, install Python 3.8+
# Then create new virtual environment
python3.8 -m venv venv

Error Messages Reference

Import Errors
File Errors
Data Errors
Model Errors

Error	Cause	Solution
`No module named 'fastf1'`	Package not installed	`pip install fastf1`
`No module named 'sklearn'`	Package not installed	`pip install scikit-learn`
`No module named 'xgboost'`	Package not installed	`pip install xgboost`
`Cannot import name 'Sequential'`	TensorFlow not installed	`pip install tensorflow`

Error	Cause	Solution
`FileNotFoundError: race_results.csv`	Data not collected	Run data collector script
`FileNotFoundError: winner_predictor_rf.pkl`	Models not trained	Run `train_all_models.py`
`PermissionError`	File locked by another program	Close other programs accessing file

Error	Cause	Solution
`ValueError: could not convert string to float`	Wrong data type	Convert columns with `pd.to_numeric()`
`KeyError: 'GridPosition'`	Column missing	Check CSV has required columns
`IndexError: list index out of range`	Empty dataframe	Verify data was loaded correctly

Error	Cause	Solution
`ValueError: Found input variables with inconsistent numbers of samples`	X and y length mismatch	Check `len(X_train) == len(y_train)`
`ValueError: could not convert string to float: 'NaN'`	Missing values in features	Fill NaN with `fillna()`
`AttributeError: 'NoneType' object has no attribute 'predict'`	Model not loaded	Load model with `joblib.load()`

Getting Help

Check Documentation

Review the complete documentation:

Installation guide
API reference
Model architecture

GitHub Issues

Search existing issues or create new one:

Provide error messages
Include Python/package versions
Describe steps to reproduce

Debug Logs

Enable detailed logging:

import logging
logging.basicConfig(
    level=logging.DEBUG,
    filename='debug.log'
)

Community Support

Join discussions:

Share your issue
Learn from others
Contribute solutions

Pro Tip: Before asking for help, try:

Check error message carefully
Search this troubleshooting guide
Google the exact error message
Check GitHub issues
Enable debug mode for detailed logs

Additional Resources

Common Issues

Data Issues

Performance Issues

Debugging Tips

Frequently Asked Questions

Error Messages Reference

Getting Help

Check Documentation

GitHub Issues

Debug Logs

Community Support

Build docs developers (and LLMs) love

Additional Resources

​Common Issues

​Data Issues

​Performance Issues

​Debugging Tips

​Frequently Asked Questions

​Error Messages Reference

​Getting Help

Check Documentation

GitHub Issues

Debug Logs

Community Support

Build docs developers (and LLMs) love

Common Issues

Data Issues

Performance Issues

Debugging Tips

Frequently Asked Questions

Error Messages Reference

Getting Help