Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.
Overview
JSON Pipeline is designed to export your Mixpanel data to supported data warehouses or object storage solutions. We maintain all properties in a high-level JSON format under theproperties key for both events and user profile data.
This documentation is intended for users with intermediate or advanced knowledge of databases and familiarity with Amazon Web Services, Google Cloud Platform, or Snowflake technology.
Supported Destinations
AWS S3
Export to Amazon S3 storage
Azure Blob Storage
Export to Azure Blob Storage
BigQuery
Export to Google BigQuery
Google Cloud Storage
Export to GCS buckets
Redshift Spectrum
Export to AWS Redshift
Snowflake
Export to Snowflake warehouse
Data Sources Output
JSON pipelines support three different data sources: events, people, and identity. We aggregate all events and user profiles properties under theproperties key to facilitate easier querying of every row with conditions. In addition to consolidating all event and user profiles properties under the properties key, we export several common properties across all records.
Events
| Name | Type | Description |
|---|---|---|
| device_id | STRING | Unique ID used to track a device while the user remains anonymous |
| distinct_id | STRING | Unique ID for the user who triggered the event |
| event_name | STRING | Name of the event |
| insert_id | STRING | Unique ID used to deduplicate events that are sent multiple times |
| properties | JSON | JSON object containing all the properties associated with the event |
| time | TIMESTAMP | Timestamp marking when the event occurred |
| user_id | STRING | Unique ID used to track a user across different devices when identified |
User Profiles
| Name | Type | Description |
|---|---|---|
| distinct_id | STRING | Unique ID for the user |
| properties | JSON | JSON object containing all user properties |
Identity Mappings
| Name | Type | Description |
|---|---|---|
| distinct_id | STRING | Unique ID for the user who triggered the event |
| resolved_distinct_id | STRING | Unique ID of the user after merging |
Key Features
Events Data Sync
Events Data Sync is enabled by default when creating JSON pipelines to ensure data consistency. This feature automatically detects data changes as soon as they are ingested and appends new files for new/late data to your storage/warehouses, helping keep the data fresh and minimizing missing data points. Event data can fall out of sync between Mixpanel’s datastore and the export destination due to several causes:- Late data can arrive multiple days later due to a mobile client being offline
- The import API can add data to previous days
- Delete requests related to GDPR can cause deletion of events and event properties
Backfill Historical Events
You can schedule an initial backfill when creating events pipeline to ensure that historical data is also exported to the destination. Use thefrom_date parameter to specify the date from which you want to export historical data. Note that the from_date must be no more than 6 months in the past.
The completion time for a backfill depends on the number of days and the volume of data in the project. Larger backfills can take several weeks.
Export Frequency
Mixpanel supports hourly and daily exports, with daily being the default.People Data Support
User profiles are exported to a single table or directory namedmp_people_data. Since user profiles are mutable, the data in the table is replaced with the latest user profiles each time an export occurs, based on the chosen schedule (daily or hourly).
User Identity Resolution
Exports from projects with ID merge enabled will need to use the identity mapping table to replicate the user counts seen in UI reporting. Mixpanel resolves multiple identifiers for an individual into one identifier for reporting unique user counts. Pipelines export event data as they appear when Mixpanel ingests them. Data sent before an alias event carries the original user identifier, not the resolved one. Use the identity mappings table to accurately count unique users. This will allow you to recreate the identity cluster that Mixpanel creates.Destination and Date Range Restrictions
To prevent data duplication and conflicts, the system enforces the following rule: you cannot create multiple event pipelines that export to the same destination with overlapping date ranges. For example, if you already have a pipeline exporting to BigQuery dataset “my_dataset” for dates January 1-31, you cannot create another pipeline exporting to the same dataset with dates January 15 - February 15, as the January 15-31 period would overlap. This constraint ensures data integrity and prevents duplicate exports to the same destination tables or storage locations.Incremental Pipelines
As of 10 September 2025, all JSON pipelines in all regions (US/EU/IN) have been migrated to our improved incremental pipeline export system.What is affected?
Events pipelines with sync enabled only: This improvement only affects event pipelines that have sync enabled. People and identity mapping pipelines remain unchanged.Benefits
- Elimination of data sync delays: No more waiting for daily sync processes to detect and fix data discrepancies
- Complete data export: All events are exported without the risk of missing late-arriving data. Late-arriving events are automatically exported regardless of how late they arrive
Changes you may notice
- Event count display: The event count shown per task in the UI now represents the total events processed per batch rather than events exported per day or per hour
- Backfill process: When a new pipeline is created, it will complete the full historical backfill first before starting regular processing
- Storage location file structure changes: Incremental pipelines will add a new file with events seen in each day for each run of the pipeline meaning more small files are expected
- Pipelines logs reset: Once your pipeline is migrated, the logging available in the UI will be reset so past jobs log lines will no longer be available
- Predictable deletion behavior: Warehouse data owners are responsible for the deletion of all data on the warehouse side
FAQs
Why is sync not available for People and Identity pipelines?
Why is sync not available for People and Identity pipelines?
The sync feature is designed for events to keep the exported data up-to-date with changes that occur in Mixpanel (e.g. late-arriving data). For People and Identity pipelines, the data is re-exported in full in each export for profiles and identity mappings, which means that it’s always up-to-date and does not require the sync feature.
How are GDPR deletions handled?
How are GDPR deletions handled?
GDPR deletions do not automatically cascade deletions to data warehouses via pipelines. When a user is deleted from Mixpanel via the GDPR deletion API, this deletion is reflected in Mixpanel’s own storage, but the deletion does not propagate to data that has already been exported to data warehouses via pipelines.To keep your synced warehouse data GDPR compliant, you will need to implement a process to delete the corresponding user and event data from your warehouse when a GDPR deletion occurs in Mixpanel.
Can I configure my pipeline to export data into a subfolder or prefix within my bucket?
Can I configure my pipeline to export data into a subfolder or prefix within my bucket?
No, Mixpanel’s JSON pipeline configuration does not currently support custom subfolders or prefixes. Data is exported into predefined structured paths that are automatically generated by Mixpanel.For example, event data is exported to the following path:
<BUCKET_NAME>/<PROJECT_ID>/mp_master_event/<YEAR>/<MONTH>/<DAY>/Can I export only a subset of events or properties with JSON pipelines?
Can I export only a subset of events or properties with JSON pipelines?
No, all events and properties are exported as they are ingested into Mixpanel. It’s not possible to filter for events and properties to include or exclude in your data pipeline exports.
Why isn't my hourly pipeline running exactly every hour?
Why isn't my hourly pipeline running exactly every hour?
With incremental pipelines, the next export iteration begins after the previous one completes rather than on a fixed schedule. This ensures we export all data completely without the risk of cutting off late-arriving events. While your pipeline is configured for “hourly” export, the actual run time will vary based on how long each iteration takes to process.