Data Pipelines

Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.

Overview

JSON Pipeline is designed to export your Mixpanel data to supported data warehouses or object storage solutions. We maintain all properties in a high-level JSON format under the properties key for both events and user profile data. This documentation is intended for users with intermediate or advanced knowledge of databases and familiarity with Amazon Web Services, Google Cloud Platform, or Snowflake technology.

Supported Destinations

AWS S3

Export to Amazon S3 storage

Azure Blob Storage

Export to Azure Blob Storage

BigQuery

Export to Google BigQuery

Google Cloud Storage

Export to GCS buckets

Redshift Spectrum

Export to AWS Redshift

Snowflake

Export to Snowflake warehouse

Data Sources Output

JSON pipelines support three different data sources: events, people, and identity. We aggregate all events and user profiles properties under the properties key to facilitate easier querying of every row with conditions. In addition to consolidating all event and user profiles properties under the properties key, we export several common properties across all records.

Events

Name	Type	Description
device_id	STRING	Unique ID used to track a device while the user remains anonymous
distinct_id	STRING	Unique ID for the user who triggered the event
event_name	STRING	Name of the event
insert_id	STRING	Unique ID used to deduplicate events that are sent multiple times
properties	JSON	JSON object containing all the properties associated with the event
time	TIMESTAMP	Timestamp marking when the event occurred
user_id	STRING	Unique ID used to track a user across different devices when identified

User Profiles

Name	Type	Description
distinct_id	STRING	Unique ID for the user
properties	JSON	JSON object containing all user properties

Identity Mappings

Name	Type	Description
distinct_id	STRING	Unique ID for the user who triggered the event
resolved_distinct_id	STRING	Unique ID of the user after merging

Key Features

Events Data Sync

Events Data Sync is enabled by default when creating JSON pipelines to ensure data consistency. This feature automatically detects data changes as soon as they are ingested and appends new files for new/late data to your storage/warehouses, helping keep the data fresh and minimizing missing data points. Event data can fall out of sync between Mixpanel’s datastore and the export destination due to several causes:

Late data can arrive multiple days later due to a mobile client being offline
The import API can add data to previous days
Delete requests related to GDPR can cause deletion of events and event properties

Data sync does not guarantee syncing GDPR Data Deletions. It is recommended to implement a strategy to remove all records of GDPR Deleted Users in your data warehouse.

Backfill Historical Events

You can schedule an initial backfill when creating events pipeline to ensure that historical data is also exported to the destination. Use the from_date parameter to specify the date from which you want to export historical data. Note that the from_date must be no more than 6 months in the past. The completion time for a backfill depends on the number of days and the volume of data in the project. Larger backfills can take several weeks.

Export Frequency

Mixpanel supports hourly and daily exports, with daily being the default.

People Data Support

User profiles are exported to a single table or directory named mp_people_data. Since user profiles are mutable, the data in the table is replaced with the latest user profiles each time an export occurs, based on the chosen schedule (daily or hourly).

User Identity Resolution

Exports from projects with ID merge enabled will need to use the identity mapping table to replicate the user counts seen in UI reporting. Mixpanel resolves multiple identifiers for an individual into one identifier for reporting unique user counts. Pipelines export event data as they appear when Mixpanel ingests them. Data sent before an alias event carries the original user identifier, not the resolved one. Use the identity mappings table to accurately count unique users. This will allow you to recreate the identity cluster that Mixpanel creates.

Use the resolved_distinct_id from the identity mappings table instead of the non-resolved distinct_id when available. If there is no resolved distinct_id, use the distinct_id from the existing people or events table.

Destination and Date Range Restrictions

To prevent data duplication and conflicts, the system enforces the following rule: you cannot create multiple event pipelines that export to the same destination with overlapping date ranges. For example, if you already have a pipeline exporting to BigQuery dataset “my_dataset” for dates January 1-31, you cannot create another pipeline exporting to the same dataset with dates January 15 - February 15, as the January 15-31 period would overlap. This constraint ensures data integrity and prevents duplicate exports to the same destination tables or storage locations.

Incremental Pipelines

As of 10 September 2025, all JSON pipelines in all regions (US/EU/IN) have been migrated to our improved incremental pipeline export system.

What is affected?

Events pipelines with sync enabled only: This improvement only affects event pipelines that have sync enabled. People and identity mapping pipelines remain unchanged.

Benefits

Elimination of data sync delays: No more waiting for daily sync processes to detect and fix data discrepancies
Complete data export: All events are exported without the risk of missing late-arriving data. Late-arriving events are automatically exported regardless of how late they arrive

Changes you may notice

Event count display: The event count shown per task in the UI now represents the total events processed per batch rather than events exported per day or per hour
Backfill process: When a new pipeline is created, it will complete the full historical backfill first before starting regular processing
Storage location file structure changes: Incremental pipelines will add a new file with events seen in each day for each run of the pipeline meaning more small files are expected
Pipelines logs reset: Once your pipeline is migrated, the logging available in the UI will be reset so past jobs log lines will no longer be available
Predictable deletion behavior: Warehouse data owners are responsible for the deletion of all data on the warehouse side

FAQs

Why is sync not available for People and Identity pipelines?

The sync feature is designed for events to keep the exported data up-to-date with changes that occur in Mixpanel (e.g. late-arriving data). For People and Identity pipelines, the data is re-exported in full in each export for profiles and identity mappings, which means that it’s always up-to-date and does not require the sync feature.

How are GDPR deletions handled?

Can I configure my pipeline to export data into a subfolder or prefix within my bucket?

No, Mixpanel’s JSON pipeline configuration does not currently support custom subfolders or prefixes. Data is exported into predefined structured paths that are automatically generated by Mixpanel.For example, event data is exported to the following path: <BUCKET_NAME>/<PROJECT_ID>/mp_master_event/<YEAR>/<MONTH>/<DAY>/

Can I export only a subset of events or properties with JSON pipelines?

No, all events and properties are exported as they are ingested into Mixpanel. It’s not possible to filter for events and properties to include or exclude in your data pipeline exports.

Why isn't my hourly pipeline running exactly every hour?

With incremental pipelines, the next export iteration begins after the previous one completes rather than on a fixed schedule. This ensures we export all data completely without the risk of cutting off late-arriving events. While your pipeline is configured for “hourly” export, the actual run time will vary based on how long each iteration takes to process.

Introduction

Quickstart

Data Ingestion

Data Structure

Reports & Analysis

Advanced Features

Data Export

Administration

Overview

Supported Destinations

AWS S3

Azure Blob Storage

BigQuery

Google Cloud Storage

Redshift Spectrum

Snowflake

Data Sources Output

Events

User Profiles

Identity Mappings

Key Features

Events Data Sync

Backfill Historical Events

Export Frequency

People Data Support

User Identity Resolution

Destination and Date Range Restrictions

Incremental Pipelines

What is affected?

Benefits

Changes you may notice

FAQs

Build docs developers (and LLMs) love

Introduction

Quickstart

Data Ingestion

Data Structure

Reports & Analysis

Advanced Features

Data Export

Administration

​Overview

​Supported Destinations

AWS S3

Azure Blob Storage

BigQuery

Google Cloud Storage

Redshift Spectrum

Snowflake

​Data Sources Output

​Events

​User Profiles

​Identity Mappings

​Key Features

​Events Data Sync

​Backfill Historical Events

​Export Frequency

​People Data Support

​User Identity Resolution

​Destination and Date Range Restrictions

​Incremental Pipelines

​What is affected?

​Benefits

​Changes you may notice

​FAQs

Build docs developers (and LLMs) love

Overview

Supported Destinations

Data Sources Output

Events

User Profiles

Identity Mappings

Key Features

Events Data Sync

Backfill Historical Events

Export Frequency

People Data Support

User Identity Resolution

Destination and Date Range Restrictions

Incremental Pipelines

What is affected?

Benefits

Changes you may notice

FAQs