Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.
AWS S3
Mixpanel allows you to export events and people data into AWS S3 through JSON Pipelines.Setting S3 Permissions
Mixpanel supports various configurations to securely manage your data on AWS S3. For resource access, Mixpanel utilizes AWS cross-account roles.Step 1: Create Data Modification Policy
To export data from Mixpanel to AWS S3, assign the following data modification permissions. Replace<BUCKET_NAME> with the name of your bucket:
Step 2: Server-Side Encryption (Optional)
Mixpanel ensures data transfer to your S3 bucket over a TLS encrypted connection. To secure your data at rest in S3, enable Server-Side Encryption (SSE). Encryption with Amazon S3-Managed Keys (SSE-S3) This option encrypts your data at rest using the AES-256 algorithm, with keys managed by S3. To enable this, selectAES from the Encryption dropdown menu when creating pipelines.
Encryption with AWS KMS-Managed Keys (SSE-KMS)
For encryption with AWS KMS, you have the option to use either the default aws/s3 key or your own custom keys.
-
Using the Default Key: Simply select
KMSfrom the Encryption dropdown menu and leave theKMS Key IDfield empty when creating your pipeline. -
Using Custom Key:
- Select
KMSfrom the Encryption dropdown menu and enter your custom key’s ARN in theKMS Key IDfield - Create an IAM policy allowing Mixpanel to use your KMS key. Replace
<KEY_ARN>with your key’s ARN:
- Select
Step 3: Create Access Role
After establishing the necessary policies, create a cross-account IAM Role:- Go to the IAM service on the AWS console
- Select Roles in the sidebar and click Create role
- On the trusted entity page, choose AWS Account, then click Another AWS account, and enter
485438090326for the Account ID - On the permissions page, locate and attach the policies you created in previous steps
- On the review page, provide a name and description for this role and click Create role
- Return to the IAM service, select Roles, and locate the role you just created
- In the Trust relationships tab, click Edit trust policy
- Update the trust relationship with the following JSON, replacing
<MIXPANEL_PROJECT_TOKEN>with your Mixpanel project token:
- Click Update policy and save
Step 4: Provide Mixpanel with S3 Details
Provide the following details when creating the pipeline:- Bucket: Specify the S3 bucket where Mixpanel data should be exported
- Region: Indicate the AWS region where your S3 bucket is located
- Role: Provide the AWS Role ARN that Mixpanel should assume when writing to your S3
- Encryption (optional): Specify the type of at-rest encryption used by the S3 bucket
- KMS Key ID (optional): If using KMS encryption, provide the custom key ID
Supported AWS Regions
US-EAST-1, US-EAST-2, US-WEST-1, US-WEST-2, AP-NORTHEAST-1, AP-NORTHEAST-2, AP-NORTHEAST-3, AP-SOUTHEAST-1, AP-SOUTHEAST-2, AP-SOUTH-1, CA-CENTRAL-1, CN-NORTH-1, CN-NORTHWEST-1, EU-CENTRAL-1, EU-NORTH-1, EU-WEST-1, EU-WEST-2, EU-WEST-3, SA-EAST-1, ME-SOUTH-1
Azure Blob Storage
Mixpanel allows you to export events and people data directly into an Azure Blob Storage instance through JSON Pipelines.Setting Blob Storage Permissions
Step 1: Create a Service Principal
Create a Service Principal in your Azure Active Directory using the Azure CLI:Step 2: Assign Role to Service Principal
Navigate to the Blob Storage container you wish to use, and assign the"Storage Blob Data Contributor" role to the newly created Service Principal.
Step 3: Provide Mixpanel with Access Details
Provide the following details when creating the pipeline:- Client Id: From the service principal credentials
- Client Secret: From the service principal credentials
- Tenant Id: From the service principal credentials
- Storage Account: The Azure storage account name
- Container Name: The container where data will be exported
BigQuery
This guide describes how Mixpanel exports your data into a customer-managed Google BigQuery dataset.Design
For events data, we create a single table calledmp_master_event and store all external properties inside the properties column in JSON type. Users can extract properties using JSON functions.
For user profiles and identity mappings, we create new tables mp_people_data_* and mp_identity_mappings_data_* with a random suffix every time and then update views mp_people_data_view and mp_identity_mappings_data_view accordingly to use the latest table. Always use the views instead of the actual tables.
Export logs are maintained in the mp_nessie_export_log table within BigQuery.
Setting BigQuery Permissions
Step 1: Create a Dataset
Create a dataset in your BigQuery to store the Mixpanel data.Step 2: Grant Permissions to Mixpanel
If your organization uses domain restriction constraint you will have to update the policy to allow Mixpanel domain
mixpanel.com and Google Workspace customer ID: C00m5wrjz.- Navigate to IAM & Admin in your Google Cloud Console
- Click + ADD to add principals
- Add new principal
[email protected]and set the role asBigQuery Job User - Click the Save button
- Go to BigQuery in your Google Cloud Console
- Open the dataset intended for Mixpanel exports
- Click on Sharing and Permissions in the drop down
- In the Data Permissions window, click on Add Principal
- Add new principal
[email protected]and set the role asBigQuery Data Owner, and save
Step 3: Provide Necessary Details for Pipeline Creation
Provide the following details when creating the pipeline:- GCP project ID: The project ID where BigQuery dataset is present
- Dataset name: Dataset created on the GCP project to which Mixpanel needs to export data
- GCP region: The region used for BigQuery
Partitioning
Data in the events tablemp_master_event is partitioned based on the _PARTITIONTIME pseudo column and in the project timezone.
Query Examples
Get the Number of Events Each Day
Query Identity Mappings
When querying the identity mappings table, prioritize using theresolved_distinct_id over the non-resolved distinct_id whenever it is available.
Supported GCP Regions for BigQuery
US, US_CENTRAL_1, US_EAST_1, US_WEST_1, US_WEST_2, US_EAST_4, NORTH_AMERICA_NORTHEAST_1, SOUTH_AMERICA_EAST_1, EU, EUROPE_NORTH_1, EUROPE_WEST_2, EUROPE_WEST_3, EUROPE_WEST_4, EUROPE_WEST_6, ASIA_SOUTH_1, ASIA_EAST_1, ASIA_EAST_2, ASIA_NORTHEAST_1, ASIA_NORTHEAST_2, ASIA_NORTHEAST_3, ASIA_SOUTHEAST_1, ASIA_SOUTHEAST_2, AUSTRALIA_SOUTHEAST_1
VPC Service Controls
IP allowlists are not supported for BigQuery. Instead, configure an ingress rule to allow access based on other attributes such as the project or service account. The Mixpanel project is:745258754925for US848893383328for EU1054291822741for IN
[email protected].
Google Cloud Storage
Mixpanel supports exporting events and people data directly to Google Cloud Storage (GCS) via JSON Pipelines.Setting GCS Permissions
Step 1: Assign Roles to Service Account on Bucket
You must grant theStorage Object Admin role to the service account [email protected] for the bucket you are creating or intend to reuse.
To assign this role:
- Navigate to the Cloud Storage in your Google Cloud Console and select the GCS bucket
- Click on the PERMISSIONS tab and select GRANT ACCESS
- In the new principals field, add
[email protected]and then selectStorage Object Adminfrom the role dropdown menu - Confirm the assignment by clicking the SAVE button
Step 2: Provide Mixpanel with GCS Details
Provide the following details when creating the pipeline:- Bucket: The GCS bucket to export Mixpanel data to
- Region: The GCS region for the bucket
Supported GCS Regions
NORTHAMERICA-NORTHEAST1, US-CENTRAL1, US-EAST1, US-EAST4, US-WEST1, US-WEST2, SOUTHAMERICA-EAST1, EUROPE-NORTH1, EUROPE-WEST1, EUROPE-WEST2, EUROPE-WEST3, EUROPE-WEST4, EUROPE-WEST6, ASIA-EAST1, ASIA-EAST2, ASIA-NORTHEAST1, ASIA-NORTHEAST2, ASIA-NORTHEAST3, ASIA-SOUTH1, ASIA-SOUTHEAST1, AUSTRALIA-SOUTHEAST1
Redshift Spectrum
Mixpanel’s JSON pipelines enable direct export of your Mixpanel data into an S3 bucket, facilitating the use of Redshift Spectrum for querying.Design
Mixpanel exports data to your S3 bucket and simultaneously updates the necessary schema in the AWS Glue Data Catalog. This allows seamless integration with Redshift Spectrum for querying your data.Setting Permissions
Redshift Spectrum requires three sets of permissions: S3, Glue, and Redshift.S3 Permissions
See AWS S3 Setting S3 Permissions above.Setting Glue Permissions
AWS Glue provides a robust data catalog service that facilitates seamless access to S3 data across various AWS services. Step 1: Create Glue Database- Navigate to the AWS Glue service on the AWS console (same region as your S3 exported data)
- Click Databases in the sidebar, then Add database
- Name your database and click Create database
string and mp_date respectively and the partition values are dates in the UTC timezone (e.g. 2024-05-01).
Setting Redshift Spectrum Permissions
Step 1: Create Policy Create a policy in IAM with the necessary permissions. Replace<BUCKET_NAME> with your actual S3 bucket name:
Query Examples
To query data in external schema created by Mixpanel, you need admin to grant schema USAGE privilege:Supported AWS Regions
US-EAST-1, US-EAST-2, US-WEST-1, US-WEST-2, AP-SOUTH-1, AP-NORTHEAST-1, AP-NORTHEAST-2, API-NORTHEAST-3, AP-SOUTHEAST-1, AP-SOUTHEAST-2, CA-CENTRAL-1, CN-NORTH-1, CN-NORTHWEST-1, EU-CENTRAL-1, EU-WEST-1, EU-WEST-2, EU-WEST-3, EU-NORTH-1, SA-EAST-1, ME-SOUTH-1
Snowflake
This guide describes how Mixpanel data is exported into a Snowflake dataset. Once an export job is scheduled, Mixpanel exports data to Snowflake on a recurring basis.Design
Mixpanel exports data to customer’s database. We first load the data into a single-column raw (VARIANT type) data table. Then, we create a view to expose all properties as columns.IP Restrictions
Mixpanel Data Pipelines supports static IP addresses for Snowflake connections when IP restrictions are configured on your Snowflake instance. If you are using Snowflake Network policy to restrict access, you might need to add the following IP addresses to the allowed list: USSet Export Permissions
Step 1: Create a Role and Grant Permissions
Create a role (MIXPANEL_EXPORT_ROLE as example) and grant access on your database, schema, warehouse to the role. Replace <database name>, <schema name>, <warehouse name> with actual names.
Step 2: Create Storage Integration
To enable Mixpanel to load from GCS owned by Mixpanel to your warehouse, create a GCS storage integration. Replace<project-id> with your Mixpanel project ID.
Step 3: Authentication to User
We provide two different authentications: password and key-pair. Password authenticationPartitioning
The data in the raw tables is clustered based ontime column but in project’s timezone. To be exact, we use CLUSTER BY (TO_DATE(CONVERT_TIMEZONE('UTC','<TIMEZONE>', TO_TIMESTAMP(DATA:time::NUMBER))) where TIMEZONE is the Mixpanel project’s timezone.
Query Examples
Snowflake supports an Object type that can store JSON objects and arrays. Mixpanel exposes array and object top-level properties as Object columns in the view.Query the Raw Table
Query the View
Getting the Number of Events in Each Day
Querying the Identity Mapping Table
When using the ID mappings table, you should use the resolveddistinct_id in place of the non-resolved distinct_id whenever present.