Data Collection Overview

The F1 Stats Archive collects comprehensive Formula 1 statistics from the Ergast API using a suite of Python scripts. Each script is responsible for fetching a specific type of data and storing it in a structured format.

Data Collection Workflow

The data collection process follows a specific order:

Events - Fetch race calendars for each season
Race Results - Collect finishing positions and race outcomes
Qualifying Results - Gather qualifying session data
Standings - Calculate driver and constructor championship points
Lap Times - Retrieve detailed lap-by-lap timing data
Pitstops - Collect pit stop information (2011 onwards)
Sprint Races - Fetch sprint race results (2021 onwards)

Directory Structure

Data is organized in a hierarchical structure:

{year}/
├── events.json                 # Season calendar
├── {race-name}/
│   ├── event_info.json        # Race details
│   ├── results.json           # Race results
│   ├── quali_results.json     # Qualifying results
│   ├── driverPoints.json      # Driver standings after this race
│   ├── teamPoints.json        # Constructor standings after this race
│   ├── laptimes.json          # Lap-by-lap timing (1996+)
│   ├── pitstops.json          # Pit stop data (2011+)
│   └── sprint_results.json    # Sprint race results (select races)

API Base URL

All scripts use the Ergast API via Jolpi:

BASE_URL = "https://api.jolpi.ca/ergast/f1"

Rate Limiting Strategy

The Ergast API enforces strict rate limits that all scripts must respect:

Burst limit: 4 requests per second
Sustained limit: 500 requests per hour

All scripts implement rate limiting using one of these approaches:

Time-based Delay

RATE_LIMIT_BURST = 4  # requests per second
REQUEST_DELAY = 1 / RATE_LIMIT_BURST  # 0.25 seconds between requests

def fetch_with_rate_limit(url):
    time.sleep(REQUEST_DELAY)
    response = requests.get(url)
    
    if response.status_code == 429:
        time.sleep(60)  # Wait and retry
        return fetch_with_rate_limit(url)
    
    return response.json()

Adaptive Rate Limiting

class Fetcher:
    def __init__(self):
        self.burst_limit = 4
        self.last_request_time = 0
    
    def make_request(self, url):
        current_time = time.time()
        time_since_last = current_time - self.last_request_time
        
        if time_since_last < (1 / self.burst_limit):
            sleep_time = (1 / self.burst_limit) - time_since_last
            time.sleep(sleep_time)
        
        response = requests.get(url)
        self.last_request_time = time.time()
        return response.json()

Common Utilities

Race Name Slugification

All scripts convert race names to folder names consistently:

def slugify(race_name):
    return race_name.lower().replace(" ", "-")

def get_race_folder_name(race):
    return race["raceName"].lower().replace(" ", "-")

Directory Creation

def create_directory(path):
    if not os.path.exists(path):
        os.makedirs(path)

Logging

Most scripts use Python’s logging module for tracking progress:

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("script_name.log"),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

Each script writes logs to both a file and the console, making it easy to monitor progress and debug issues.

Data Availability

Different types of data are available for different time periods:

Data Type	Available From	Notes
Events	1950	Full F1 history
Race Results	1950	Complete results
Qualifying	1994	Earlier years may be incomplete
Standings	1950	Championship points
Lap Times	1996	Lap-by-lap timing
Pitstops	2011	Detailed pit stop data
Sprint Races	2021	Only for sprint race weekends

Next Steps

Events

Learn how to fetch race calendars

Race Results

Collect race finishing positions

Qualifying

Gather qualifying session data

Standings

Calculate championship points

Getting Started

Data Collection

Data Structure

Usage Guide

Data Collection Workflow

Directory Structure

API Base URL

Rate Limiting Strategy

Time-based Delay

Adaptive Rate Limiting

Common Utilities

Race Name Slugification

Directory Creation

Logging

Data Availability

Next Steps

Events

Race Results

Qualifying

Standings

Build docs developers (and LLMs) love

Getting Started

Data Collection

Data Structure

Usage Guide

​Data Collection Workflow

​Directory Structure

​API Base URL

​Rate Limiting Strategy

​Time-based Delay

​Adaptive Rate Limiting

​Common Utilities

​Race Name Slugification

​Directory Creation

​Logging

​Data Availability

​Next Steps

Events

Race Results

Qualifying

Standings

Build docs developers (and LLMs) love

Data Collection Workflow

Directory Structure

API Base URL

Rate Limiting Strategy

Time-based Delay

Adaptive Rate Limiting

Common Utilities

Race Name Slugification

Directory Creation

Logging

Data Availability

Next Steps