Skip to main content
Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.

Overview

JSON Pipeline is designed to export your Mixpanel data to supported data warehouses or object storage solutions. We maintain all properties in a high-level JSON format under the properties key for both events and user profile data. This documentation is intended for users with intermediate or advanced knowledge of databases and familiarity with Amazon Web Services, Google Cloud Platform, or Snowflake technology.

Supported Destinations

AWS S3

Export to Amazon S3 storage

Azure Blob Storage

Export to Azure Blob Storage

BigQuery

Export to Google BigQuery

Google Cloud Storage

Export to GCS buckets

Redshift Spectrum

Export to AWS Redshift

Snowflake

Export to Snowflake warehouse

Data Sources Output

JSON pipelines support three different data sources: events, people, and identity. We aggregate all events and user profiles properties under the properties key to facilitate easier querying of every row with conditions. In addition to consolidating all event and user profiles properties under the properties key, we export several common properties across all records.

Events

NameTypeDescription
device_idSTRINGUnique ID used to track a device while the user remains anonymous
distinct_idSTRINGUnique ID for the user who triggered the event
event_nameSTRINGName of the event
insert_idSTRINGUnique ID used to deduplicate events that are sent multiple times
propertiesJSONJSON object containing all the properties associated with the event
timeTIMESTAMPTimestamp marking when the event occurred
user_idSTRINGUnique ID used to track a user across different devices when identified

User Profiles

NameTypeDescription
distinct_idSTRINGUnique ID for the user
propertiesJSONJSON object containing all user properties

Identity Mappings

NameTypeDescription
distinct_idSTRINGUnique ID for the user who triggered the event
resolved_distinct_idSTRINGUnique ID of the user after merging

Key Features

Events Data Sync

Events Data Sync is enabled by default when creating JSON pipelines to ensure data consistency. This feature automatically detects data changes as soon as they are ingested and appends new files for new/late data to your storage/warehouses, helping keep the data fresh and minimizing missing data points. Event data can fall out of sync between Mixpanel’s datastore and the export destination due to several causes:
  • Late data can arrive multiple days later due to a mobile client being offline
  • The import API can add data to previous days
  • Delete requests related to GDPR can cause deletion of events and event properties
Data sync does not guarantee syncing GDPR Data Deletions. It is recommended to implement a strategy to remove all records of GDPR Deleted Users in your data warehouse.

Backfill Historical Events

You can schedule an initial backfill when creating events pipeline to ensure that historical data is also exported to the destination. Use the from_date parameter to specify the date from which you want to export historical data. Note that the from_date must be no more than 6 months in the past. The completion time for a backfill depends on the number of days and the volume of data in the project. Larger backfills can take several weeks.

Export Frequency

Mixpanel supports hourly and daily exports, with daily being the default.

People Data Support

User profiles are exported to a single table or directory named mp_people_data. Since user profiles are mutable, the data in the table is replaced with the latest user profiles each time an export occurs, based on the chosen schedule (daily or hourly).

User Identity Resolution

Exports from projects with ID merge enabled will need to use the identity mapping table to replicate the user counts seen in UI reporting. Mixpanel resolves multiple identifiers for an individual into one identifier for reporting unique user counts. Pipelines export event data as they appear when Mixpanel ingests them. Data sent before an alias event carries the original user identifier, not the resolved one. Use the identity mappings table to accurately count unique users. This will allow you to recreate the identity cluster that Mixpanel creates.
Use the resolved_distinct_id from the identity mappings table instead of the non-resolved distinct_id when available. If there is no resolved distinct_id, use the distinct_id from the existing people or events table.

Destination and Date Range Restrictions

To prevent data duplication and conflicts, the system enforces the following rule: you cannot create multiple event pipelines that export to the same destination with overlapping date ranges. For example, if you already have a pipeline exporting to BigQuery dataset “my_dataset” for dates January 1-31, you cannot create another pipeline exporting to the same dataset with dates January 15 - February 15, as the January 15-31 period would overlap. This constraint ensures data integrity and prevents duplicate exports to the same destination tables or storage locations.

Incremental Pipelines

As of 10 September 2025, all JSON pipelines in all regions (US/EU/IN) have been migrated to our improved incremental pipeline export system.

What is affected?

Events pipelines with sync enabled only: This improvement only affects event pipelines that have sync enabled. People and identity mapping pipelines remain unchanged.

Benefits

  • Elimination of data sync delays: No more waiting for daily sync processes to detect and fix data discrepancies
  • Complete data export: All events are exported without the risk of missing late-arriving data. Late-arriving events are automatically exported regardless of how late they arrive

Changes you may notice

  • Event count display: The event count shown per task in the UI now represents the total events processed per batch rather than events exported per day or per hour
  • Backfill process: When a new pipeline is created, it will complete the full historical backfill first before starting regular processing
  • Storage location file structure changes: Incremental pipelines will add a new file with events seen in each day for each run of the pipeline meaning more small files are expected
  • Pipelines logs reset: Once your pipeline is migrated, the logging available in the UI will be reset so past jobs log lines will no longer be available
  • Predictable deletion behavior: Warehouse data owners are responsible for the deletion of all data on the warehouse side

FAQs

The sync feature is designed for events to keep the exported data up-to-date with changes that occur in Mixpanel (e.g. late-arriving data). For People and Identity pipelines, the data is re-exported in full in each export for profiles and identity mappings, which means that it’s always up-to-date and does not require the sync feature.
GDPR deletions do not automatically cascade deletions to data warehouses via pipelines. When a user is deleted from Mixpanel via the GDPR deletion API, this deletion is reflected in Mixpanel’s own storage, but the deletion does not propagate to data that has already been exported to data warehouses via pipelines.To keep your synced warehouse data GDPR compliant, you will need to implement a process to delete the corresponding user and event data from your warehouse when a GDPR deletion occurs in Mixpanel.
No, Mixpanel’s JSON pipeline configuration does not currently support custom subfolders or prefixes. Data is exported into predefined structured paths that are automatically generated by Mixpanel.For example, event data is exported to the following path: <BUCKET_NAME>/<PROJECT_ID>/mp_master_event/<YEAR>/<MONTH>/<DAY>/
No, all events and properties are exported as they are ingested into Mixpanel. It’s not possible to filter for events and properties to include or exclude in your data pipeline exports.
With incremental pipelines, the next export iteration begins after the previous one completes rather than on a fixed schedule. This ensures we export all data completely without the risk of cutting off late-arriving events. While your pipeline is configured for “hourly” export, the actual run time will vary based on how long each iteration takes to process.

Build docs developers (and LLMs) love