Skip to main content

Bing COVID-19 Dataset

Bing COVID-19 data includes confirmed, fatal, and recovered cases from all regions, updated daily. The Bing COVID-19 Tracker reflects this data.

Data Sources

Bing collects data from multiple trusted, reliable sources, including:

Dataset Information

Available Formats

Modified Bing COVID-19 datasets are available in multiple formats: All modified datasets have ISO 3166 subdivision codes and load times added. They use lowercase column names with underscore separators. CSV-format raw data is also available.

Data Volume

  • Total Records: 4,766,737+ rows (as of March 5, 2023)
  • Update Frequency: Daily
  • Historical Data: Earlier versions available on GitHub

Schema

Column NameData TypeDescriptionExample Values
idintUnique identifier742546, 69019298
updateddateThe as-of date for the record2021-04-23
confirmedintConfirmed case count for the region1, 2, 1000
confirmed_changeintChange in confirmed cases from previous day1, 2, 50
deathsintDeath case count for the region1, 2, 100
deaths_changesmallintChange in deaths from previous day1, 2, 5
recoveredintRecovered count for the region1, 2, 500
recovered_changeintChange in recovered cases from previous day1, 2, 25
country_regionstringCountry/regionUnited States, India
admin_region_1stringRegion within country_regionTexas, Georgia
admin_region_2stringRegion within admin_region_1Washington County
iso2string2-letter country code identifierUS, IN
iso3string3-letter country code identifierUSA, IND
iso_subdivisionstringTwo-part ISO subdivision codeUS-TX, US-GA
latitudedoubleLatitude of the region centroid42.28708, 19.59852
longitudedoubleLongitude of the region centroid-2.5396, -155.5186
load_timetimestampDate and time file was loaded from source2021-04-26 00:06:34

Preview Data

idupdatedconfirmeddeathscountry_regionadmin_region_1confirmed_changedeaths_change
3389952020-01-212620Worldwidenull--
3389962020-01-223130Worldwidenull510
3389972020-01-235780Worldwidenull2650
3389982020-01-248410Worldwidenull2630
3389992020-01-2513200Worldwidenull4790
3390002020-01-2620140Worldwidenull6940
3390012020-01-2727980Worldwidenull7840
3390022020-01-2845930Worldwidenull17950

Data Access

Python (Pandas Direct Access)

Download and analyze the dataset using pandas:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Read directly from Azure Blob Storage
df = pd.read_parquet(
    "https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet"
)

print(df.head(10))
print(df.dtypes)

Analyze Worldwide Data

# Filter worldwide data
df_worldwide = df[df['country_region'] == 'Worldwide']

# Create pivot table
df_worldwide_pivot = df_worldwide.pivot_table(
    df_worldwide, 
    index=['country_region', 'updated']
)

print(df_worldwide_pivot)
# Plot confirmed cases over time
df_worldwide.plot(
    kind='line',
    x='updated',
    y='confirmed',
    grid=True,
    title='Worldwide Confirmed COVID-19 Cases'
)

# Plot deaths over time
df_worldwide.plot(
    kind='line',
    x='updated',
    y='deaths',
    grid=True,
    title='Worldwide COVID-19 Deaths'
)

# Plot daily change in confirmed cases
df_worldwide.plot(
    kind='line',
    x='updated',
    y='confirmed_change',
    grid=True,
    title='Daily Change in Confirmed Cases'
)

plt.show()

Azure Databricks (PySpark)

Access the dataset in Azure Databricks:
# Azure storage access configuration
blob_account_name = "pandemicdatalake"
blob_container_name = "public"
blob_relative_path = "curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet"

# Configure SPARK to read from Blob remotely
wasbs_path = f"wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/{blob_relative_path}"
print(f'Remote blob path: {wasbs_path}')

# Read parquet file
df = spark.read.parquet(wasbs_path)
df.createOrReplaceTempView('covid_data')

# Display top 10 rows
print('Displaying top 10 rows:')
display(spark.sql('SELECT * FROM covid_data LIMIT 10'))

Azure Synapse Analytics

# Azure storage access configuration
blob_account_name = "pandemicdatalake"
blob_container_name = "public"
blob_relative_path = "curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet"

# Configure SPARK to read from Blob remotely
wasbs_path = f"wasbs://{blob_container_name}@{blob_account_name}.blob.core.windows.net/{blob_relative_path}"
print(f'Remote blob path: {wasbs_path}')

# Read parquet file
df = spark.read.parquet(wasbs_path)
df.createOrReplaceTempView('source')

# Display top 10 rows
display(spark.sql('SELECT * FROM source LIMIT 10'))

Analysis Examples

Query Specific Country Data

import pandas as pd

df = pd.read_parquet(
    "https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet"
)

# Filter for a specific country
us_data = df[df['country_region'] == 'United States']

# Get latest data by state
latest_date = us_data['updated'].max()
latest_by_state = us_data[
    us_data['updated'] == latest_date
].groupby('admin_region_1').agg({
    'confirmed': 'sum',
    'deaths': 'sum',
    'recovered': 'sum'
}).sort_values('confirmed', ascending=False)

print(f"\nTop 10 US States by Confirmed Cases (as of {latest_date}):")
print(latest_by_state.head(10))

Calculate Mortality Rate

# Calculate mortality rate by country
latest_global = df[df['updated'] == df['updated'].max()].copy()
latest_global['mortality_rate'] = (
    latest_global['deaths'] / latest_global['confirmed'] * 100
).round(2)

top_countries = latest_global[
    latest_global['confirmed'] > 1000000
].sort_values('mortality_rate', ascending=False)

print("\nCountries with >1M cases - Mortality Rates:")
print(top_countries[['country_region', 'confirmed', 'deaths', 'mortality_rate']].head(10))

Time Series Analysis

# Analyze 7-day moving average of new cases
us_nationwide = df[
    (df['country_region'] == 'United States') & 
    (df['admin_region_1'].isna())
].sort_values('updated')

us_nationwide['7day_avg_cases'] = us_nationwide['confirmed_change'].rolling(window=7).mean()
us_nationwide['7day_avg_deaths'] = us_nationwide['deaths_change'].rolling(window=7).mean()

# Plot 7-day moving averages
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

ax1.plot(us_nationwide['updated'], us_nationwide['7day_avg_cases'])
ax1.set_title('US COVID-19: 7-Day Moving Average of New Cases')
ax1.set_ylabel('Cases')
ax1.grid(True)

ax2.plot(us_nationwide['updated'], us_nationwide['7day_avg_deaths'])
ax2.set_title('US COVID-19: 7-Day Moving Average of Deaths')
ax2.set_ylabel('Deaths')
ax2.set_xlabel('Date')
ax2.grid(True)

plt.tight_layout()
plt.show()

License and Attribution

The data is available strictly for educational and academic purposes under these terms and conditions.

Valid Use Cases:

  • Academic institutions
  • Government agencies
  • Medical research

Attribution Requirement:

Data used or cited in publications should include an attribution to “Bing COVID-19 Tracker” with a link to www.bing.com/covid.
This dataset is for educational and research purposes only. Always verify critical health information with official sources.

COVID-19 Data Lake

Access additional COVID-19 datasets covering testing, hospital capacity, and mobility

ECDC COVID Cases

European Centre for Disease Prevention and Control COVID-19 data

COVID Tracking Project

US state-level testing and outcome data

Oxford Government Response

Government policy responses to COVID-19

Next Steps

Create ML Dataset

Learn how to create Azure ML datasets from this data

Browse Catalog

Explore other available datasets

Public Holidays

View public holidays dataset

Genomics Data

Explore genomics datasets

Contact

For questions or feedback about this or other datasets in the COVID-19 Data Lake, contact [email protected].

Build docs developers (and LLMs) love