Quickstart Guide

This guide will get you up and running with RaceData in just a few minutes. You’ll learn how to download the data and start analyzing Formula 1 races.

Installation & Setup

No special installation is required to use RaceData - just download the CSV files and start analyzing with your preferred tools.

For Python users, we recommend installing pandas for easy data manipulation:

pip install pandas

Step 1: Download the Data

Choose one of the following methods to access the RaceData dataset:

Direct Download (Recommended)

Download the latest consolidated data archive directly from GitHub releases:

wget https://github.com/TracingInsights/RaceData/releases/latest/download/data.zip
unzip data.zip

This gives you all 18 CSV tables in a single download.

HuggingFace Datasets

Access the data through HuggingFace for seamless integration with ML pipelines:

from datasets import load_dataset

# Load a specific table
dataset = load_dataset("tracinginsights/RaceData", data_files="races.csv")

Programmatic Download with Python

Use the same approach as the RaceData automation scripts:

import kagglehub
from pathlib import Path

# Download from Kaggle
path = kagglehub.dataset_download("jtrotman/formula-1-race-data")
print(f"Data downloaded to: {path}")

Requires Kaggle API credentials configured via ~/.kaggle/kaggle.json

Step 2: Load and Explore the Data

Once downloaded, you can immediately start working with the data using pandas or any CSV-compatible tool.

import pandas as pd

# Load the races data
races = pd.read_csv('data/races.csv')

# Load driver information
drivers = pd.read_csv('data/drivers.csv')

# Load race results
results = pd.read_csv('data/results.csv')

# Quick preview
print(f"Total races: {len(races)}")
print(f"Total drivers: {len(drivers)}")
print(races.head())

Step 3: Run Your First Query

Let’s analyze some real Formula 1 data with a practical example.

Example: Find the Most Successful Drivers

import pandas as pd

# Load necessary tables
drivers = pd.read_csv('data/drivers.csv')
results = pd.read_csv('data/results.csv')

# Merge results with driver information
driver_results = results.merge(
    drivers[['driverId', 'forename', 'surname', 'nationality']],
    on='driverId'
)

# Count wins (position = 1)
wins_by_driver = (
    driver_results[driver_results['position'] == '1']
    .groupby(['forename', 'surname', 'nationality'])
    .size()
    .reset_index(name='wins')
    .sort_values('wins', ascending=False)
)

# Display top 10 drivers
print("\nTop 10 Most Successful F1 Drivers by Race Wins:")
print(wins_by_driver.head(10).to_string(index=False))

Example: Analyze Lap Times for a Specific Race

import pandas as pd
import matplotlib.pyplot as plt

# Load lap times and drivers
lap_times = pd.read_csv('data/lap_times.csv')
drivers = pd.read_csv('data/drivers.csv')
races = pd.read_csv('data/races.csv')

# Get a specific race (e.g., 2024 Monaco GP)
monaco_2024 = races[
    (races['year'] == 2024) & 
    (races['name'].str.contains('Monaco', case=False))
]

if not monaco_2024.empty:
    race_id = monaco_2024.iloc[0]['raceId']
    
    # Get lap times for this race
    race_laps = lap_times[lap_times['raceId'] == race_id]
    
    # Merge with driver names
    race_laps = race_laps.merge(
        drivers[['driverId', 'code']], 
        on='driverId'
    )
    
    # Convert lap time to seconds
    race_laps['time_seconds'] = race_laps['milliseconds'] / 1000
    
    # Find fastest lap
    fastest = race_laps.loc[race_laps['time_seconds'].idxmin()]
    print(f"\nFastest lap: {fastest['code']} - {fastest['time']} (Lap {fastest['lap']})")

Common Use Cases

Performance Analysis

Track driver and constructor performance over time, identify trends, and predict future outcomes.

Pit Stop Strategy

Analyze pit stop timings, durations, and their impact on race results.

Circuit Comparison

Compare performance across different circuits and identify track-specific patterns.

Historical Research

Explore how Formula 1 has evolved from 1950 to the present day.

Understanding the Data Structure

The dataset uses relational keys to connect tables. Here are the main relationships:

All ID fields (raceId, driverId, constructorId, etc.) are foreign keys that allow you to join tables together for comprehensive analysis.

Next Steps

Now that you have the data loaded and understand the basics, explore these resources:

Data Schema

Complete reference of all 18 tables and their columns

Analysis Guides

Advanced queries and visualization examples

Data Access Methods

Programmatic access and integration options

Need Help?

If you encounter any issues or have questions:

Check the GitHub Issues for common problems
Visit the TracingInsights contact page for support
Join the F1 data community on Reddit

The data is updated automatically within 3 hours of each race. To get the latest data, simply re-download from the GitHub releases page.

Get Started

Data Access

Data Schema

Guides

Quickstart

Quickstart Guide

Installation & Setup

Step 1: Download the Data

Step 2: Load and Explore the Data

Step 3: Run Your First Query

Example: Find the Most Successful Drivers

Example: Analyze Lap Times for a Specific Race

Common Use Cases

Performance Analysis

Pit Stop Strategy

Circuit Comparison

Historical Research

Understanding the Data Structure

Next Steps

Data Schema

Analysis Guides

Data Access Methods

Need Help?

Build docs developers (and LLMs) love

Get Started

Data Access

Data Schema

Guides

​Quickstart Guide

​Installation & Setup

​Step 1: Download the Data

​Step 2: Load and Explore the Data

​Step 3: Run Your First Query

​Example: Find the Most Successful Drivers

​Example: Analyze Lap Times for a Specific Race

​Common Use Cases

Performance Analysis

Pit Stop Strategy

Circuit Comparison

Historical Research

​Understanding the Data Structure

​Next Steps

Data Schema

Analysis Guides

Data Access Methods

​Need Help?

Build docs developers (and LLMs) love

Quickstart Guide

Installation & Setup

Step 1: Download the Data

Step 2: Load and Explore the Data

Step 3: Run Your First Query

Example: Find the Most Successful Drivers

Example: Analyze Lap Times for a Specific Race

Common Use Cases

Understanding the Data Structure

Next Steps

Need Help?