Skip to main content

Public Holidays Dataset

Worldwide public holiday data sourced from PyPI holidays package and Wikipedia, covering 38 countries or regions from 1970 to 2099. Each row indicates the holiday info for a specific date, country or region, and whether most people have paid time off.

Dataset Details

Volume and Retention

  • Format: Parquet
  • Size: Approximately 500KB
  • Date Range: January 1, 1970 to January 1, 2099
  • Records: 20,665+ unique dates across 38 countries

Storage Location

This dataset is stored in the East US Azure region. We recommend locating compute resources in East US for affinity.

Data Sources

This dataset combines data from:

License

The combined dataset is provided under the Creative Commons Attribution-ShareAlike 3.0 Unported License.

Schema

Column NameData TypeDescriptionExample Values
countryOrRegionstringCountry or region full nameSweden, Norway
countryRegionCodestringCountry or region code (ISO format)SE, NO
datetimestampDate of the holiday2025-12-25 00:00:00
holidayNamestringFull name of the holidaySøndag, Christmas Day
isPaidTimeOffbooleanWhether most people have paid time off (available for US, GB, India only)True, False, NULL
normalizeHolidayNamestringNormalized name of the holidayChristmas, New Year
The isPaidTimeOff column is only available for the United States, Great Britain, and India. For other countries, this field will be NULL.

Preview Data

countryOrRegionholidayNamenormalizeHolidayNamecountryRegionCodedate
NorwaySøndagSøndagNO12/28/2098
SwedenSöndagSöndagSE12/28/2098
AustraliaBoxing DayBoxing DayAU12/26/2098
HungaryKarácsony másnapjaKarácsony másnapjaHU12/26/2098
AustriaStefanitagStefanitagAT12/26/2098
CanadaBoxing DayBoxing DayCA12/26/2098
CroatiaSveti StjepanSveti StjepanHR12/26/2098
Czech2. svátek vánoční2. svátek vánočníCZ12/26/2098

Data Access

Python SDK (azureml-opendatasets)

Access the dataset using the Azure ML Open Datasets SDK:
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil.relativedelta import relativedelta

# Get holidays from the last month
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_pandas_dataframe()

# Display dataset info
print(hol_df.info())
print(hol_df.head())

Azure Storage (Direct Access)

Access the dataset directly from Azure Blob Storage:
import pandas as pd
import sys

# Install required packages
!{sys.executable} -m pip install azure-storage-blob pyarrow pandas

from azure.storage.blob import BlobServiceClient, ContainerClient
import os

# Azure storage configuration
azure_storage_account_name = "azureopendatastorage"
container_name = "holidaydatacontainer"
folder_name = "Processed"

# Create blob service client
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(container_url)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)

# Find the latest parquet file
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        target_blob_name = blob.name
        break

print(f'Downloading: {target_blob_name}')

# Download and read the parquet file
_, filename = os.path.split(target_blob_name)
blob_client = container_client.get_blob_client(target_blob_name)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().readinto(local_file)

# Read into pandas DataFrame
df = pd.read_parquet(filename)
print(df.head())

Azure Databricks (Python SDK)

# Install in Databricks cluster first:
# pip install azureml-opendatasets

from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil.relativedelta import relativedelta

end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

display(hol_df.limit(5))

Azure Databricks (PySpark)

# Azure storage access configuration
blob_account_name = "azureopendatastorage"
blob_container_name = "holidaydatacontainer"
blob_relative_path = "Processed"

# Configure SPARK to read from Blob storage
wasbs_path = f"wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/{blob_relative_path}"
print(f'Remote blob path: {wasbs_path}')

# Read parquet files
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# Display top 10 rows
print('Displaying top 10 rows:')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse Analytics

# Using azureml-opendatasets in Synapse
from azureml.opendatasets import PublicHolidays
from datetime import datetime
from dateutil.relativedelta import relativedelta

end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_spark_dataframe()

# Display top 5 rows
display(hol_df.limit(5))

Use Cases

Incorporate holiday information into demand forecasting models to account for seasonal variations and holiday effects on consumer behavior.
Plan staffing and resource allocation around public holidays across different countries for global operations.
Build calendar applications that display public holidays for multiple countries and regions.
Analyze business metrics with holiday context to understand performance patterns during holiday periods.
Help travelers and tourism applications identify public holidays in destination countries.

Example Analysis

Count Holidays by Country

import pandas as pd
from azureml.opendatasets import PublicHolidays
from datetime import datetime

# Get holidays for 2024
start_date = datetime(2024, 1, 1)
end_date = datetime(2024, 12, 31)

hol = PublicHolidays(start_date=start_date, end_date=end_date)
hol_df = hol.to_pandas_dataframe()

# Count holidays by country
holiday_counts = hol_df.groupby('countryOrRegion')['holidayName'].count().sort_values(ascending=False)
print("\nTop 10 countries by number of holidays:")
print(holiday_counts.head(10))

Find Holidays on Specific Date

# Find all countries with holidays on December 25, 2024
christmas_date = pd.Timestamp('2024-12-25')
christmas_holidays = hol_df[hol_df['date'] == christmas_date]

print(f"\nCountries celebrating holidays on {christmas_date.date()}:")
print(christmas_holidays[['countryOrRegion', 'holidayName']].to_string(index=False))

Filter Paid Time Off Holidays

# Get only holidays with paid time off in the US
us_paid_holidays = hol_df[
    (hol_df['countryRegionCode'] == 'US') & 
    (hol_df['isPaidTimeOff'] == True)
]

print("\nUS Federal Holidays with Paid Time Off:")
print(us_paid_holidays[['date', 'holidayName']].to_string(index=False))

Supported Countries

The dataset covers 38 countries and regions including:
  • Australia, Austria, Canada, Croatia, Czech Republic
  • Denmark, Finland, France, Germany, Hungary
  • India, Ireland, Italy, Japan, Mexico
  • Netherlands, New Zealand, Norway, Poland, Portugal
  • Spain, Sweden, Switzerland, United Kingdom, United States
  • And more…

Data Quality Notes

The isPaidTimeOff field is only populated for the United States, Great Britain, and India. For all other countries, this field will be NULL.
Holiday names are provided in the local language of each country. Use the normalizeHolidayName field for a more standardized representation where available.

Next Steps

Create ML Dataset

Learn how to create Azure ML datasets from this data

Browse Catalog

Explore other available datasets

COVID-19 Data

View COVID-19 tracking datasets

Genomics Data

Explore genomics datasets

Contact

For questions about this dataset, email [email protected].

Build docs developers (and LLMs) love