The F1 Stats Archive collects comprehensive Formula 1 statistics from the Ergast API using a suite of Python scripts. Each script is responsible for fetching a specific type of data and storing it in a structured format.
Data Collection Workflow
The data collection process follows a specific order:
Events - Fetch race calendars for each season
Race Results - Collect finishing positions and race outcomes
Qualifying Results - Gather qualifying session data
Standings - Calculate driver and constructor championship points
Lap Times - Retrieve detailed lap-by-lap timing data
Pitstops - Collect pit stop information (2011 onwards)
Sprint Races - Fetch sprint race results (2021 onwards)
Directory Structure
Data is organized in a hierarchical structure:
{year}/
├── events.json # Season calendar
├── {race-name}/
│ ├── event_info.json # Race details
│ ├── results.json # Race results
│ ├── quali_results.json # Qualifying results
│ ├── driverPoints.json # Driver standings after this race
│ ├── teamPoints.json # Constructor standings after this race
│ ├── laptimes.json # Lap-by-lap timing (1996+)
│ ├── pitstops.json # Pit stop data (2011+)
│ └── sprint_results.json # Sprint race results (select races)
API Base URL
All scripts use the Ergast API via Jolpi:
BASE_URL = "https://api.jolpi.ca/ergast/f1"
Rate Limiting Strategy
The Ergast API enforces strict rate limits that all scripts must respect:
Burst limit : 4 requests per second
Sustained limit : 500 requests per hour
All scripts implement rate limiting using one of these approaches:
Time-based Delay
RATE_LIMIT_BURST = 4 # requests per second
REQUEST_DELAY = 1 / RATE_LIMIT_BURST # 0.25 seconds between requests
def fetch_with_rate_limit ( url ):
time.sleep( REQUEST_DELAY )
response = requests.get(url)
if response.status_code == 429 :
time.sleep( 60 ) # Wait and retry
return fetch_with_rate_limit(url)
return response.json()
Adaptive Rate Limiting
class Fetcher :
def __init__ ( self ):
self .burst_limit = 4
self .last_request_time = 0
def make_request ( self , url ):
current_time = time.time()
time_since_last = current_time - self .last_request_time
if time_since_last < ( 1 / self .burst_limit):
sleep_time = ( 1 / self .burst_limit) - time_since_last
time.sleep(sleep_time)
response = requests.get(url)
self .last_request_time = time.time()
return response.json()
Common Utilities
Race Name Slugification
All scripts convert race names to folder names consistently:
def slugify ( race_name ):
return race_name.lower().replace( " " , "-" )
def get_race_folder_name ( race ):
return race[ "raceName" ].lower().replace( " " , "-" )
Directory Creation
def create_directory ( path ):
if not os.path.exists(path):
os.makedirs(path)
Logging
Most scripts use Python’s logging module for tracking progress:
import logging
logging.basicConfig(
level = logging. INFO ,
format = " %(asctime)s - %(name)s - %(levelname)s - %(message)s " ,
handlers = [
logging.FileHandler( "script_name.log" ),
logging.StreamHandler()
]
)
logger = logging.getLogger( __name__ )
Each script writes logs to both a file and the console, making it easy to monitor progress and debug issues.
Data Availability
Different types of data are available for different time periods:
Data Type Available From Notes Events 1950 Full F1 history Race Results 1950 Complete results Qualifying 1994 Earlier years may be incomplete Standings 1950 Championship points Lap Times 1996 Lap-by-lap timing Pitstops 2011 Detailed pit stop data Sprint Races 2021 Only for sprint race weekends
Next Steps
Events Learn how to fetch race calendars
Race Results Collect race finishing positions
Qualifying Gather qualifying session data
Standings Calculate championship points