Skip to main content

Common Issues

Quick solutions to the most frequently encountered problems.
Symptom:
ModuleNotFoundError: No module named 'fastf1'
ModuleNotFoundError: No module named 'sklearn'
Cause: Python packages not installed or wrong environment activatedSolution:
1

Verify virtual environment is activated

# Check if venv is active (you should see (venv) in prompt)

# Windows
venv\Scripts\activate

# Mac/Linux
source venv/bin/activate
2

Install all dependencies

pip install -r requirements.txt
3

Verify installation

pip list | grep fastf1
pip list | grep scikit-learn
pip list | grep xgboost
If you’re still getting errors, try reinstalling with:
pip install --upgrade --force-reinstall -r requirements.txt
Symptom:
FileNotFoundError: [Errno 2] No such file or directory: './models/saved_models/winner_predictor_rf.pkl'
Cause: Models haven’t been trained yetSolution:
1

Check if data exists

ls data/raw/
# Should see: race_results.csv, lap_times.csv, pit_stops.csv, weather.csv
If files are missing, collect data first:
python src/data/f1_data_collector.py
2

Train the models

python train_all_models.py
This will:
  • Run feature engineering
  • Train Random Forest model
  • Train XGBoost model
  • Save models to models/saved_models/
Expected duration: 5-10 minutes
3

Verify model files exist

ls models/saved_models/
# Should see:
# winner_predictor_rf.pkl
# winner_predictor_xgb.pkl
# feature_columns.pkl
Symptom: Model test accuracy below 70%Possible Causes:
  • Insufficient training data
  • Poor feature engineering
  • Overfitting or underfitting
  • Data quality issues
Solutions:
Add more recent seasons to training data:
# Update data collection to include 2025
python src/data/f1_data_collector.py --years 2018-2025
More data → Better generalization
Symptom:
OSError: [Errno 48] Address already in use
Cause: Another process is using port 5000Solution:
Mac/Linux:
# Find process using port 5000
lsof -i :5000

# Kill the process (replace PID with actual process ID)
kill -9 <PID>
Windows:
# Find process
netstat -ano | findstr :5000

# Kill process (replace PID with actual ID)
taskkill /PID <PID> /F
Symptom:
TimeoutError: Failed to fetch session data
ConnectionError: Remote end closed connection
Cause: Internet connection issues or FastF1 API rate limitingSolution:
1

Check internet connection

Ensure stable connection and try again
2

Add retry logic

import time
from requests.exceptions import ConnectionError

def fetch_session_with_retry(year, round, session_type, max_retries=3):
    for attempt in range(max_retries):
        try:
            session = fastf1.get_session(year, round, session_type)
            session.load()
            return session
        except (ConnectionError, TimeoutError) as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            time.sleep(5)  # Wait 5 seconds before retry
    raise Exception(f"Failed after {max_retries} attempts")
3

Enable caching

import fastf1
fastf1.Cache.enable_cache('data/f1_cache/')  # Cache successful requests
Cached data will be reused on subsequent runs.
FastF1 API sometimes has rate limits. If you encounter frequent timeouts, add delays between requests:
import time
for year in range(2018, 2025):
    for round in range(1, 25):
        # ... fetch data ...
        time.sleep(2)  # Wait 2 seconds between requests

Data Issues

Symptom: CSV files have fewer rows than expected or missing columnsDebugging:
import pandas as pd

# Check data completeness
df = pd.read_csv('data/raw/race_results.csv')

print(f"Total rows: {len(df)}")
print(f"Missing values:\n{df.isnull().sum()}")
print(f"Date range: {df['Year'].min()} - {df['Year'].max()}")
Solutions:
# Delete existing data and re-collect
rm data/raw/*.csv
python src/data/f1_data_collector.py
Symptom:
TypeError: unsupported operand type(s) for +: 'str' and 'float'
Cause: Columns are wrong data type (string instead of numeric)Solution:
# Convert to correct types
df['GridPosition'] = pd.to_numeric(df['GridPosition'], errors='coerce')
df['Points'] = df['Points'].astype(float)
df['Year'] = df['Year'].astype(int)
df['IsRaining'] = df['IsRaining'].astype(bool)

# Check types
print(df.dtypes)
Symptom: More rows than expected, duplicate driver-race combinationsCheck for duplicates:
# Find duplicates
duplicates = df[df.duplicated(['Year', 'Round', 'Driver'], keep=False)]
print(f"Found {len(duplicates)} duplicate rows")

# Remove duplicates (keep first occurrence)
df = df.drop_duplicates(['Year', 'Round', 'Driver'], keep='first')

Performance Issues

Symptom: Training takes 30+ minutesSolutions:
# Sample training data
df_sample = df.sample(frac=0.5, random_state=42)  # Use 50% of data

# Or use only recent years
df_recent = df[df['Year'] >= 2020]
Symptom: Dashboard takes 10+ seconds to render predictionsSolutions:
  1. Cache model predictions:
    from functools import lru_cache
    
    @lru_cache(maxsize=100)
    def predict_cached(grid, weather, tire):
        return model.predict(...)
    
  2. Optimize data loading:
    # Load only necessary columns
    df = pd.read_csv('data.csv', usecols=['Year', 'Driver', 'Position'])
    
    # Use more efficient data types
    df['Year'] = df['Year'].astype('int16')  # Instead of int64
    
  3. Reduce plot complexity:
    # Limit to top 10 drivers in charts
    top_drivers = df.nlargest(10, 'Points')
    

Debugging Tips

Get detailed error messages:
# In app.py
if __name__ == '__main__':
    app.run(debug=True)  # Shows stack traces
For training scripts:
import logging
logging.basicConfig(level=logging.DEBUG)

Frequently Asked Questions

Answer: Models are trained on historical data (2018-2023). Recent regulatory changes, new drivers, or team performance shifts may not be fully captured.Solution: Retrain models with latest data:
python src/data/f1_data_collector.py  # Get 2024/2025 data
python train_all_models.py            # Retrain
Answer: While the model achieves 75-80% accuracy, it’s designed for educational purposes. Betting involves risk beyond prediction accuracy.
Disclaimer: This is an educational project. Do not use predictions for gambling or financial decisions. Past performance does not guarantee future results.
Answer: The FastF1 API updates automatically with new race data.
# In f1_data_collector.py, update year range:
for year in range(2018, 2026):  # Include 2025 races
    # ... collection code ...
Then retrain models with updated data.
Answer: Yes! The web dashboard allows driver-specific predictions, or use the API:
from src.models.winner_predictor import WinnerPredictor

predictor = WinnerPredictor()
predictor.load_models('models/saved_models/')

# Predict for Verstappen starting P1 in dry conditions
features = {
    'GridPosition': 1,
    'Driver_TotalWins': 52,
    'Team_AvgPosition': 1.8,
    'Weather': 'DRY'
    # ... other features ...
}

result = predictor.predict_race(pd.DataFrame([features]))
print(f"Win probability: {result['ensemble_probability']:.2%}")
Answer: Python 3.8 or higher is required.
# Check your version
python --version

# If too old, install Python 3.8+
# Then create new virtual environment
python3.8 -m venv venv

Error Messages Reference

ErrorCauseSolution
No module named 'fastf1'Package not installedpip install fastf1
No module named 'sklearn'Package not installedpip install scikit-learn
No module named 'xgboost'Package not installedpip install xgboost
Cannot import name 'Sequential'TensorFlow not installedpip install tensorflow

Getting Help

Check Documentation

Review the complete documentation:
  • Installation guide
  • API reference
  • Model architecture

GitHub Issues

Search existing issues or create new one:
  • Provide error messages
  • Include Python/package versions
  • Describe steps to reproduce

Debug Logs

Enable detailed logging:
import logging
logging.basicConfig(
    level=logging.DEBUG,
    filename='debug.log'
)

Community Support

Join discussions:
  • Share your issue
  • Learn from others
  • Contribute solutions
Pro Tip: Before asking for help, try:
  1. Check error message carefully
  2. Search this troubleshooting guide
  3. Google the exact error message
  4. Check GitHub issues
  5. Enable debug mode for detailed logs

Build docs developers (and LLMs) love