Warehouse Connectors

Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.

This page provides detailed setup instructions for each supported warehouse connector. For an overview of Data Pipelines features, see Data Pipelines.

AWS S3

Mixpanel allows you to export events and people data into AWS S3 through JSON Pipelines.

Setting S3 Permissions

Mixpanel supports various configurations to securely manage your data on AWS S3. For resource access, Mixpanel utilizes AWS cross-account roles.

Step 1: Create Data Modification Policy

To export data from Mixpanel to AWS S3, assign the following data modification permissions. Replace <BUCKET_NAME> with the name of your bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MixpanelS3AccessStatement",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": ["arn:aws:s3:::<BUCKET_NAME>", "arn:aws:s3:::<BUCKET_NAME>/*"]
    }
  ]
}

Step 2: Server-Side Encryption (Optional)

Mixpanel ensures data transfer to your S3 bucket over a TLS encrypted connection. To secure your data at rest in S3, enable Server-Side Encryption (SSE). Encryption with Amazon S3-Managed Keys (SSE-S3) This option encrypts your data at rest using the AES-256 algorithm, with keys managed by S3. To enable this, select AES from the Encryption dropdown menu when creating pipelines. Encryption with AWS KMS-Managed Keys (SSE-KMS) For encryption with AWS KMS, you have the option to use either the default aws/s3 key or your own custom keys.

Using the Default Key: Simply select KMS from the Encryption dropdown menu and leave the KMS Key ID field empty when creating your pipeline.
Using Custom Key:
1. Select KMS from the Encryption dropdown menu and enter your custom key’s ARN in the KMS Key ID field
2. Create an IAM policy allowing Mixpanel to use your KMS key. Replace <KEY_ARN> with your key’s ARN:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MixpanelKmsStatement",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:Encrypt",
        "kms:GenerateDataKey",
        "kms:ReEncryptTo",
        "kms:GenerateDataKeyWithoutPlaintext",
        "kms:DescribeKey",
        "kms:ReEncryptFrom"
      ],
      "Resource": "<KEY_ARN>"
    }
  ]
}

Step 3: Create Access Role

After establishing the necessary policies, create a cross-account IAM Role:

Go to the IAM service on the AWS console
Select Roles in the sidebar and click Create role
On the trusted entity page, choose AWS Account, then click Another AWS account, and enter 485438090326 for the Account ID
On the permissions page, locate and attach the policies you created in previous steps
On the review page, provide a name and description for this role and click Create role

To ensure secure operations, limit the trust relationship to the Mixpanel export user:

Return to the IAM service, select Roles, and locate the role you just created
In the Trust relationships tab, click Edit trust policy
Update the trust relationship with the following JSON, replacing <MIXPANEL_PROJECT_TOKEN> with your Mixpanel project token:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::485438090326:user/mixpanel-export"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<MIXPANEL_PROJECT_TOKEN>"
        }
      }
    }
  ]
}

Click Update policy and save

This setup utilizes an external ID to prevent the confused deputy problem.

Step 4: Provide Mixpanel with S3 Details

Provide the following details when creating the pipeline:

Bucket: Specify the S3 bucket where Mixpanel data should be exported
Region: Indicate the AWS region where your S3 bucket is located
Role: Provide the AWS Role ARN that Mixpanel should assume when writing to your S3
Encryption (optional): Specify the type of at-rest encryption used by the S3 bucket
KMS Key ID (optional): If using KMS encryption, provide the custom key ID

Supported AWS Regions

US-EAST-1, US-EAST-2, US-WEST-1, US-WEST-2, AP-NORTHEAST-1, AP-NORTHEAST-2, AP-NORTHEAST-3, AP-SOUTHEAST-1, AP-SOUTHEAST-2, AP-SOUTH-1, CA-CENTRAL-1, CN-NORTH-1, CN-NORTHWEST-1, EU-CENTRAL-1, EU-NORTH-1, EU-WEST-1, EU-WEST-2, EU-WEST-3, SA-EAST-1, ME-SOUTH-1

Azure Blob Storage

Mixpanel allows you to export events and people data directly into an Azure Blob Storage instance through JSON Pipelines.

Setting Blob Storage Permissions

Step 1: Create a Service Principal

Create a Service Principal in your Azure Active Directory using the Azure CLI:

az ad sp create-for-rbac --sdk-auth

This command generates credentials in JSON format. Ensure you securely handle the output as it contains sensitive information:

{
  "clientId": "redacted",
  "clientSecret": "redacted",
  "subscriptionId": "redacted",
  "tenantId": "redacted",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/",
  "activeDirectoryGraphResourceId": "https://graph.windows.net/",
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/"
}

Step 2: Assign Role to Service Principal

Navigate to the Blob Storage container you wish to use, and assign the "Storage Blob Data Contributor" role to the newly created Service Principal.

Step 3: Provide Mixpanel with Access Details

Provide the following details when creating the pipeline:

Client Id: From the service principal credentials
Client Secret: From the service principal credentials
Tenant Id: From the service principal credentials
Storage Account: The Azure storage account name
Container Name: The container where data will be exported

BigQuery

This guide describes how Mixpanel exports your data into a customer-managed Google BigQuery dataset.

Design

For events data, we create a single table called mp_master_event and store all external properties inside the properties column in JSON type. Users can extract properties using JSON functions. For user profiles and identity mappings, we create new tables mp_people_data_* and mp_identity_mappings_data_* with a random suffix every time and then update views mp_people_data_view and mp_identity_mappings_data_view accordingly to use the latest table. Always use the views instead of the actual tables. Export logs are maintained in the mp_nessie_export_log table within BigQuery.

Please do not modify the schema of tables generated by Mixpanel. Altering the table schema can cause the pipeline to fail to export due to schema mismatches.

Setting BigQuery Permissions

Step 1: Create a Dataset

Create a dataset in your BigQuery to store the Mixpanel data.

Step 2: Grant Permissions to Mixpanel

If your organization uses domain restriction constraint you will have to update the policy to allow Mixpanel domain mixpanel.com and Google Workspace customer ID: C00m5wrjz.

Mixpanel requires two permissions to manage the dataset: BigQuery Job User

Navigate to IAM & Admin in your Google Cloud Console
Click + ADD to add principals
Add new principal [email protected] and set the role as BigQuery Job User
Click the Save button

BigQuery Data Owner

Go to BigQuery in your Google Cloud Console
Open the dataset intended for Mixpanel exports
Click on Sharing and Permissions in the drop down
In the Data Permissions window, click on Add Principal
Add new principal [email protected] and set the role as BigQuery Data Owner, and save

Step 3: Provide Necessary Details for Pipeline Creation

Provide the following details when creating the pipeline:

GCP project ID: The project ID where BigQuery dataset is present
Dataset name: Dataset created on the GCP project to which Mixpanel needs to export data
GCP region: The region used for BigQuery

Partitioning

Data in the events table mp_master_event is partitioned based on the _PARTITIONTIME pseudo column and in the project timezone.

TIMEPARTITIONING should not be updated on the table. It will cause your export jobs to fail. Create a new table/view from this table for custom partitioning.

Query Examples

Get the Number of Events Each Day

SELECT
  _PARTITIONTIME AS pt,
  COUNT(*)
FROM
  `<your gcp project>.<your dataset>.mp_master_event`
WHERE
  DATE(_PARTITIONTIME) <= "2024-05-31"
  AND DATE(_PARTITIONTIME) >= "2024-05-01"
GROUP BY
  pt

Query Identity Mappings

When querying the identity mappings table, prioritize using the resolved_distinct_id over the non-resolved distinct_id whenever it is available.

SELECT
  CASE
    WHEN mappings.resolved_distinct_id IS NOT NULL THEN mappings.resolved_distinct_id
    WHEN mappings.resolved_distinct_id IS NULL THEN events.distinct_id
  END AS resolved_distinct_id,
  COUNT(*) AS count
FROM
  `<your gcp project>.<your dataset>.mp_master_event` events
INNER JOIN
  `<your gcp project>.<your dataset>.mp_identity_mappings_data_view` mappings
ON
  events.distinct_id = mappings.distinct_id
  AND JSON_VALUE(properties,'$.$city') = "San Francisco"
  AND DATE(events._PARTITIONTIME) <= "2024-05-31"
  AND DATE(events._PARTITIONTIME) >= "2024-05-01"
GROUP BY
  resolved_distinct_id
LIMIT
  100

Supported GCP Regions for BigQuery

US, US_CENTRAL_1, US_EAST_1, US_WEST_1, US_WEST_2, US_EAST_4, NORTH_AMERICA_NORTHEAST_1, SOUTH_AMERICA_EAST_1, EU, EUROPE_NORTH_1, EUROPE_WEST_2, EUROPE_WEST_3, EUROPE_WEST_4, EUROPE_WEST_6, ASIA_SOUTH_1, ASIA_EAST_1, ASIA_EAST_2, ASIA_NORTHEAST_1, ASIA_NORTHEAST_2, ASIA_NORTHEAST_3, ASIA_SOUTHEAST_1, ASIA_SOUTHEAST_2, AUSTRALIA_SOUTHEAST_1

VPC Service Controls

IP allowlists are not supported for BigQuery. Instead, configure an ingress rule to allow access based on other attributes such as the project or service account. The Mixpanel project is:

745258754925 for US
848893383328 for EU
1054291822741 for IN

The service account is [email protected].

Google Cloud Storage

Mixpanel supports exporting events and people data directly to Google Cloud Storage (GCS) via JSON Pipelines.

Setting GCS Permissions

Step 1: Assign Roles to Service Account on Bucket

You must grant the Storage Object Admin role to the service account [email protected] for the bucket you are creating or intend to reuse. To assign this role:

Navigate to the Cloud Storage in your Google Cloud Console and select the GCS bucket
Click on the PERMISSIONS tab and select GRANT ACCESS
In the new principals field, add [email protected] and then select Storage Object Admin from the role dropdown menu
Confirm the assignment by clicking the SAVE button

Step 2: Provide Mixpanel with GCS Details

Provide the following details when creating the pipeline:

Bucket: The GCS bucket to export Mixpanel data to
Region: The GCS region for the bucket

Supported GCS Regions

NORTHAMERICA-NORTHEAST1, US-CENTRAL1, US-EAST1, US-EAST4, US-WEST1, US-WEST2, SOUTHAMERICA-EAST1, EUROPE-NORTH1, EUROPE-WEST1, EUROPE-WEST2, EUROPE-WEST3, EUROPE-WEST4, EUROPE-WEST6, ASIA-EAST1, ASIA-EAST2, ASIA-NORTHEAST1, ASIA-NORTHEAST2, ASIA-NORTHEAST3, ASIA-SOUTH1, ASIA-SOUTHEAST1, AUSTRALIA-SOUTHEAST1

Redshift Spectrum

Mixpanel’s JSON pipelines enable direct export of your Mixpanel data into an S3 bucket, facilitating the use of Redshift Spectrum for querying.

Design

Mixpanel exports data to your S3 bucket and simultaneously updates the necessary schema in the AWS Glue Data Catalog. This allows seamless integration with Redshift Spectrum for querying your data.

Setting Permissions

Redshift Spectrum requires three sets of permissions: S3, Glue, and Redshift.

S3 Permissions

See AWS S3 Setting S3 Permissions above.

Setting Glue Permissions

AWS Glue provides a robust data catalog service that facilitates seamless access to S3 data across various AWS services. Step 1: Create Glue Database

Navigate to the AWS Glue service on the AWS console (same region as your S3 exported data)
Click Databases in the sidebar, then Add database
Name your database and click Create database

Step 2: Create Data Modification Policy Mixpanel partitions the Glue table by default if it has the proper AWS permissions. The partition key type and name are string and mp_date respectively and the partition values are dates in the UTC timezone (e.g. 2024-05-01).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MixpanelGlueAccessStatement",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:CreateTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersions",
        "glue:UpdateTable",
        "glue:DeleteTable",
        "glue:GetPartition",
        "glue:CreatePartition",
        "glue:DeletePartition",
        "glue:UpdatePartition",
        "glue:BatchCreatePartition",
        "glue:GetPartitions",
        "glue:BatchDeletePartition",
        "glue:BatchGetPartition"
      ],
      "Resource": "*"
    }
  ]
}

Step 3: Create Access Role Follow the same process as AWS S3 to create a cross-account IAM role and attach the Glue policy.

Setting Redshift Spectrum Permissions

Step 1: Create Policy Create a policy in IAM with the necessary permissions. Replace <BUCKET_NAME> with your actual S3 bucket name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MixpanelRedshiftAccessStatement",
      "Effect": "Allow",
      "Action": [
        "redshift-data:GetStatementResult",
        "redshift-data:DescribeStatement",
        "redshift-data:BatchExecuteStatement",
        "redshift-data:ExecuteStatement",
        "redshift-serverless:GetCredentials",
        "redshift-serverless:GetWorkgroup",
        "redshift:GetClusterCredentialsWithIAM",
        "redshift:GetClusterCredentials"
      ],
      "Resource": "*"
    },
    {
      "Sid": "MixpanelGlueAccessStatement",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartition",
        "glue:GetPartitions"
      ],
      "Resource": "*"
    },
    {
      "Sid": "MixpanelS3AccessStatement",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": ["arn:aws:s3:::<BUCKET_NAME>", "arn:aws:s3:::<BUCKET_NAME>/*"]
    }
  ]
}

Step 2-5: Follow the detailed steps in the source documentation for creating the access role, associating it to Redshift, creating Redshift database, and granting privileges to database user.

Query Examples

To query data in external schema created by Mixpanel, you need admin to grant schema USAGE privilege:

GRANT USAGE ON SCHEMA "mp_json_export" TO "IAM:<your user name>";

Once the pipelines are successfully executed and you have necessary privilege, you can query nested JSON data:

SET json_serialization_enable TO true;
SELECT
    distinct_id
FROM
    "your-database"."mp_json_export"."mp_master_event"
WHERE
    json_extract_path_text(properties, '$city') = 'San Francisco'
LIMIT 10;

Supported AWS Regions

US-EAST-1, US-EAST-2, US-WEST-1, US-WEST-2, AP-SOUTH-1, AP-NORTHEAST-1, AP-NORTHEAST-2, API-NORTHEAST-3, AP-SOUTHEAST-1, AP-SOUTHEAST-2, CA-CENTRAL-1, CN-NORTH-1, CN-NORTHWEST-1, EU-CENTRAL-1, EU-WEST-1, EU-WEST-2, EU-WEST-3, EU-NORTH-1, SA-EAST-1, ME-SOUTH-1

Snowflake

This guide describes how Mixpanel data is exported into a Snowflake dataset. Once an export job is scheduled, Mixpanel exports data to Snowflake on a recurring basis.

Design

Mixpanel exports data to customer’s database. We first load the data into a single-column raw (VARIANT type) data table. Then, we create a view to expose all properties as columns.

IP Restrictions

Mixpanel Data Pipelines supports static IP addresses for Snowflake connections when IP restrictions are configured on your Snowflake instance. If you are using Snowflake Network policy to restrict access, you might need to add the following IP addresses to the allowed list: US

31.112.201
184.21.33
225.176.74

147.68.192
204.164.122
204.177.251

Set Export Permissions

Step 1: Create a Role and Grant Permissions

Create a role (MIXPANEL_EXPORT_ROLE as example) and grant access on your database, schema, warehouse to the role. Replace <database name>, <schema name>, <warehouse name> with actual names.

CREATE ROLE MIXPANEL_EXPORT_ROLE;
GRANT ALL ON DATABASE <database name> TO ROLE MIXPANEL_EXPORT_ROLE;
GRANT ALL ON SCHEMA <database name>.<schema name> TO ROLE MIXPANEL_EXPORT_ROLE;
GRANT USAGE ON WAREHOUSE <warehouse name> TO ROLE MIXPANEL_EXPORT_ROLE;
GRANT OPERATE ON WAREHOUSE <warehouse name> TO ROLE MIXPANEL_EXPORT_ROLE;
GRANT MONITOR ON WAREHOUSE <warehouse name> TO ROLE MIXPANEL_EXPORT_ROLE;

Step 2: Create Storage Integration

To enable Mixpanel to load from GCS owned by Mixpanel to your warehouse, create a GCS storage integration. Replace <project-id> with your Mixpanel project ID.

CREATE STORAGE INTEGRATION MIXPANEL_EXPORT_STORAGE_INTEGRATION
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ("gcs://mixpanel-export-pipelines-<project-id>");
GRANT USAGE ON INTEGRATION MIXPANEL_EXPORT_STORAGE_INTEGRATION TO MIXPANEL_EXPORT_ROLE;

Step 3: Authentication to User

We provide two different authentications: password and key-pair. Password authentication

CREATE USER MIXPANEL_EXPORT_USER PASSWORD='<password you provided>' DEFAULT_ROLE=MIXPANEL_EXPORT_ROLE;
ALTER USER MIXPANEL_EXPORT_USER SET PASSWORD='<password you provided>';
GRANT ROLE MIXPANEL_EXPORT_ROLE TO USER MIXPANEL_EXPORT_USER;

Key-pair based authentication

CREATE USER MIXPANEL_EXPORT_USER RSA_PUBLIC_KEY='<mixpanel generated key>' DEFAULT_ROLE=MIXPANEL_EXPORT_ROLE;
ALTER USER MIXPANEL_EXPORT_USER SET RSA_PUBLIC_KEY='<mixpanel generated key>';
GRANT ROLE MIXPANEL_EXPORT_ROLE TO USER MIXPANEL_EXPORT_USER;

Partitioning

The data in the raw tables is clustered based on time column but in project’s timezone. To be exact, we use CLUSTER BY (TO_DATE(CONVERT_TIMEZONE('UTC','<TIMEZONE>', TO_TIMESTAMP(DATA:time::NUMBER))) where TIMEZONE is the Mixpanel project’s timezone.

Query Examples

Snowflake supports an Object type that can store JSON objects and arrays. Mixpanel exposes array and object top-level properties as Object columns in the view.

Query the Raw Table

SELECT count(*)
FROM <DB_NAME>.<SCHEMA_NAME>.MP_MASTER_EVENT_RAW
WHERE DATA:event_name::string = 'sign up';

Query the View

SELECT count(*)
FROM <DB_NAME>.<SCHEMA_NAME>.MP_MASTER_EVENT
WHERE event_name = 'sign up';

Getting the Number of Events in Each Day

SELECT
  TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE>', time)) as ttime,
  count(*)
FROM <DB_NAME>.<SCHEMA_NAME>.MP_MASTER_EVENT
WHERE ttime>=TO_DATE('2021-12-03') AND ttime<=TO_DATE('2024-09-01')
GROUP BY ttime
ORDER BY ttime;

Querying the Identity Mapping Table

When using the ID mappings table, you should use the resolved distinct_id in place of the non-resolved distinct_id whenever present.

SELECT
  COALESCE(mappings.resolved_distinct_id, events.distinct_id) AS resolved_distinct_id,
  COUNT(*) AS count
FROM
  <DB_NAME>.<SCHEMA_NAME>.MP_MASTER_EVENT events
FULL OUTER JOIN
  <DB_NAME>.<SCHEMA_NAME>.MP_IDENTITY_MAPPINGS_DATA mappings
ON
  events.distinct_id = mappings.distinct_id
  AND events.properties:"$city"::STRING = 'San Francisco'
  AND TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE>', events.time)) >= TO_DATE('2020-04-01')
  AND TO_DATE(CONVERT_TIMEZONE('UTC','<PROJECT_TIMEZONE>', events.time)) <= TO_DATE('2024-09-01')
GROUP BY
  COALESCE(mappings.resolved_distinct_id, events.distinct_id)
LIMIT
  100;

Introduction

Quickstart

Data Ingestion

Data Structure

Reports & Analysis

Advanced Features

Data Export

Administration

​AWS S3

​Setting S3 Permissions

​Step 1: Create Data Modification Policy

​Step 2: Server-Side Encryption (Optional)

​Step 3: Create Access Role

​Step 4: Provide Mixpanel with S3 Details

​Supported AWS Regions

​Azure Blob Storage

​Setting Blob Storage Permissions

​Step 1: Create a Service Principal

​Step 2: Assign Role to Service Principal

​Step 3: Provide Mixpanel with Access Details

​BigQuery

​Design

​Setting BigQuery Permissions

​Step 1: Create a Dataset

​Step 2: Grant Permissions to Mixpanel

​Step 3: Provide Necessary Details for Pipeline Creation

​Partitioning

​Query Examples

​Get the Number of Events Each Day

​Query Identity Mappings

​Supported GCP Regions for BigQuery

​VPC Service Controls

​Google Cloud Storage

​Setting GCS Permissions

​Step 1: Assign Roles to Service Account on Bucket

​Step 2: Provide Mixpanel with GCS Details

​Supported GCS Regions

​Redshift Spectrum

​Design

​Setting Permissions

​S3 Permissions

​Setting Glue Permissions

​Setting Redshift Spectrum Permissions

​Query Examples

​Supported AWS Regions

​Snowflake

​Design

​IP Restrictions

​Set Export Permissions

​Step 1: Create a Role and Grant Permissions

​Step 2: Create Storage Integration

​Step 3: Authentication to User

​Partitioning

​Query Examples

​Query the Raw Table

​Query the View

​Getting the Number of Events in Each Day

​Querying the Identity Mapping Table

Build docs developers (and LLMs) love

AWS S3

Setting S3 Permissions

Step 1: Create Data Modification Policy

Step 2: Server-Side Encryption (Optional)

Step 3: Create Access Role

Step 4: Provide Mixpanel with S3 Details

Supported AWS Regions

Azure Blob Storage

Setting Blob Storage Permissions

Step 1: Create a Service Principal

Step 2: Assign Role to Service Principal

Step 3: Provide Mixpanel with Access Details

BigQuery

Design

Setting BigQuery Permissions

Step 1: Create a Dataset

Step 2: Grant Permissions to Mixpanel

Step 3: Provide Necessary Details for Pipeline Creation

Partitioning

Query Examples

Get the Number of Events Each Day

Query Identity Mappings

Supported GCP Regions for BigQuery

VPC Service Controls

Google Cloud Storage

Setting GCS Permissions

Step 1: Assign Roles to Service Account on Bucket

Step 2: Provide Mixpanel with GCS Details

Supported GCS Regions

Redshift Spectrum

Design

Setting Permissions

S3 Permissions

Setting Glue Permissions

Setting Redshift Spectrum Permissions

Query Examples

Supported AWS Regions

Snowflake

Design

IP Restrictions

Set Export Permissions

Step 1: Create a Role and Grant Permissions

Step 2: Create Storage Integration

Step 3: Authentication to User

Partitioning

Query Examples

Query the Raw Table

Query the View

Getting the Number of Events in Each Day

Querying the Identity Mapping Table