Skip to main content
The F1 Stats Archive collects comprehensive Formula 1 statistics from the Ergast API using a suite of Python scripts. Each script is responsible for fetching a specific type of data and storing it in a structured format.

Data Collection Workflow

The data collection process follows a specific order:
  1. Events - Fetch race calendars for each season
  2. Race Results - Collect finishing positions and race outcomes
  3. Qualifying Results - Gather qualifying session data
  4. Standings - Calculate driver and constructor championship points
  5. Lap Times - Retrieve detailed lap-by-lap timing data
  6. Pitstops - Collect pit stop information (2011 onwards)
  7. Sprint Races - Fetch sprint race results (2021 onwards)

Directory Structure

Data is organized in a hierarchical structure:
{year}/
├── events.json                 # Season calendar
├── {race-name}/
│   ├── event_info.json        # Race details
│   ├── results.json           # Race results
│   ├── quali_results.json     # Qualifying results
│   ├── driverPoints.json      # Driver standings after this race
│   ├── teamPoints.json        # Constructor standings after this race
│   ├── laptimes.json          # Lap-by-lap timing (1996+)
│   ├── pitstops.json          # Pit stop data (2011+)
│   └── sprint_results.json    # Sprint race results (select races)

API Base URL

All scripts use the Ergast API via Jolpi:
BASE_URL = "https://api.jolpi.ca/ergast/f1"

Rate Limiting Strategy

The Ergast API enforces strict rate limits that all scripts must respect:
  • Burst limit: 4 requests per second
  • Sustained limit: 500 requests per hour
All scripts implement rate limiting using one of these approaches:

Time-based Delay

RATE_LIMIT_BURST = 4  # requests per second
REQUEST_DELAY = 1 / RATE_LIMIT_BURST  # 0.25 seconds between requests

def fetch_with_rate_limit(url):
    time.sleep(REQUEST_DELAY)
    response = requests.get(url)
    
    if response.status_code == 429:
        time.sleep(60)  # Wait and retry
        return fetch_with_rate_limit(url)
    
    return response.json()

Adaptive Rate Limiting

class Fetcher:
    def __init__(self):
        self.burst_limit = 4
        self.last_request_time = 0
    
    def make_request(self, url):
        current_time = time.time()
        time_since_last = current_time - self.last_request_time
        
        if time_since_last < (1 / self.burst_limit):
            sleep_time = (1 / self.burst_limit) - time_since_last
            time.sleep(sleep_time)
        
        response = requests.get(url)
        self.last_request_time = time.time()
        return response.json()

Common Utilities

Race Name Slugification

All scripts convert race names to folder names consistently:
def slugify(race_name):
    return race_name.lower().replace(" ", "-")

def get_race_folder_name(race):
    return race["raceName"].lower().replace(" ", "-")

Directory Creation

def create_directory(path):
    if not os.path.exists(path):
        os.makedirs(path)

Logging

Most scripts use Python’s logging module for tracking progress:
import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("script_name.log"),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)
Each script writes logs to both a file and the console, making it easy to monitor progress and debug issues.

Data Availability

Different types of data are available for different time periods:
Data TypeAvailable FromNotes
Events1950Full F1 history
Race Results1950Complete results
Qualifying1994Earlier years may be incomplete
Standings1950Championship points
Lap Times1996Lap-by-lap timing
Pitstops2011Detailed pit stop data
Sprint Races2021Only for sprint race weekends

Next Steps

Events

Learn how to fetch race calendars

Race Results

Collect race finishing positions

Qualifying

Gather qualifying session data

Standings

Calculate championship points

Build docs developers (and LLMs) love