Experiments

Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel is an ideal place to run experiments because all your product analytics data is already here.

The Experiment Report is a separately priced product add-on currently only offered to those on the Enterprise Plan. See our pricing page for more details.Customers who have not purchased the Experiment add-on will be able to create up to 3 experiments per organization. Please note that creation is irreversible.

Why Experiment?

Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel provides comprehensive insights into how changes affect your users’ journey.

Prerequisites

Before getting started with experiments:

Exposure Event Tracking: Implement your experimentation events (see Implementation below)
Baseline Metrics: Ensure that Mixpanel is already tracking your key metrics

Experiment Process

The experiment workflow follows these stages: Plan → Setup & Launch → Monitor → Interpret Results → Make Decisions

Plan: Define hypothesis, success metrics, and test parameters
Setup & Launch: Configure experiment settings and begin exposure
Monitor: Track experiment progress and data collection
Interpret Results: Analyze statistical significance and lift
Make Decisions: Choose whether to ship, iterate, or abandon changes

Setup & Launch Your Experiment

Select an Experiment

Click ‘New Experiment’ from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populated in the dropdown.

Only experiments tracked via exposure events ($experiment_started) can be analyzed in the experiment report.

Choose the Control Variant

Select the ‘Variant’ that represents your control. All your other variant(s) will be compared to the control, i.e., how much better they perform compared to the control variant.

Choose Success Metrics

Choose the primary metrics of success for the experiment. You can choose from either saved Mixpanel metrics or create a new metric leveraging the query panel. You can also add secondary metrics and guardrail metrics as required.

Select the Test Duration

Enter either the sample size (the number of users to be exposed to the experiment) or the minimum number of days you want the experiment to run. This will determine the test duration.

Confirm Default Configurations

Mixpanel has set default automatic configurations:

Experiment Model type: Sequential
Confidence Threshold: 95%
Experiment Start Date: Date of the first user exposed to the experiment

Modify them as needed for your experiment.

Implementation for Experimentation

Mixpanel experiment analysis works based on exposure events. To use the experiment report, you must send your Exposure events in the following format: Event Name: $experiment_started Event Properties:

Experiment name - the name of the experiment to which the user has been exposed
Variant name - the name of the variant into which the user was bucketed

JavaScript Example

// Track exposure event when user sees the experiment
mixpanel.track('$experiment_started', {
  'Experiment name': 'New Checkout Flow',
  'Variant name': 'variant_a'
});

Python Example

from mixpanel import Mixpanel

mp = Mixpanel('YOUR_PROJECT_TOKEN')

# Track exposure event
mp.track('user_123', '$experiment_started', {
    'Experiment name': 'Pricing Page Test',
    'Variant name': 'control'
})

iOS (Swift) Example

// Track exposure event
mixpanel.track(event: "$experiment_started", properties: [
    "Experiment name": "New Onboarding",
    "Variant name": "test"
])

Android (Kotlin) Example

// Track exposure event
val props = JSONObject()
props.put("Experiment name", "Payment Flow")
props.put("Variant name", "variant_b")

mixpanel.track("\$experiment_started", props)

When to Track Exposure Events

Send exposure event only when a user is actually exposed, not at the start of a session.

An exposure event ONLY needs to be sent the first time a user is exposed to an experiment, as long as the user is always in the initial bucketed variant
If a user is part of multiple experiments, send a corresponding exposure event for each experiment
Send exposure details and not the assignment

Example: If you want to run an experiment on the payment page of a ride-sharing app, you only really care about users who open the app, book a ride, and then reach the payment page. The exposure event should ideally be implemented to track only once the payment page is reached.

Monitor Your Experiment

Once your experiment is running, track these key indicators:

Sample Size Progress: Track how many users have been exposed
Data Quality: Ensure exposure events are being tracked correctly
Guardrail Metrics: Watch for any negative impacts on important metrics
External Factors: Note any external events that might affect results

Interpret Your Results

The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:

p-value: Shows if the variants’ delta impact vs the control is statistically significant
lift: The variants’ delta impact on the metric vs control

Metric rows in the table are highlighted when any difference is calculated with high confidence:

Green: Positive differences, where the variant value is higher than the control
Red: Negative differences, where the variant value is lower than the control
Gray: Statistically insignificant results

Understanding Statistical Significance

Statistical significance (p-value) helps you determine whether your experiment results are likely to hold true for the full rollout.

Metric Types and Distributions

Mixpanel categorizes metrics into three types, each using different statistical distributions:

Count Metrics (Total Events, Total Sessions): Use Poisson distribution
- Examples: Total purchases, total page views, session count
Rate Metrics (Conversion rates, Retention rates): Use Bernoulli distribution
- Examples: Signup conversion rate, checkout completion rate, 7-day retention
Value Metrics (Averages, Sums of properties): Use normal distribution approximation
- Examples: Average order value, total revenue, average session duration

Example: E-commerce Checkout Experiment

Scenario: Testing a new checkout UI on an e-commerce site with 20 users (10 control, 10 treatment). Results:

Control group: 5 users converted (50% conversion rate), average cart size $60
Treatment group: 6 users converted (60% conversion rate), average cart size $67

For Conversion Rate (Rate Metric - Bernoulli Distribution):

Group rates: Control = 0.5, Treatment = 0.6
Variance calculation: Control = 0.5 × (1-0.5) = 0.25, Treatment = 0.6 × (1-0.6) = 0.24
Standard error: Combined SE = √((0.25/10) + (0.24/10)) = 0.221
Z-score: (0.6 - 0.5) / 0.221 = 0.45
P-value: ~0.65 (not statistically significant)

This example shows why larger sample sizes are crucial—with only 10 users per group, even a 10-point difference in conversion rate isn’t statistically significant.

Understanding Lift

Lift is the percentage difference between the control and variant(s) metrics:

Lift = \frac{variant \, group \, rate - control \, group \, rate}{control \, group \, rate}

Lift Calculation by Metric Type

Count Metrics (Total Events, Sessions):

Group Rate: Total count ÷ Number of users exposed
Example: If the treatment group has 150 total purchases from 100 exposed users, the group rate = 1.5 purchases per user

Rate Metrics (Conversion, Retention):

Group Rate: The actual rate (already normalized)
Example: If 25 out of 100 users convert, group rate = 0.25 (25% conversion rate)

Value Metrics (Averages, Sums):

Group Rate: Sum of property values ÷ Number of users exposed
Example: If the treatment group spent $5,000 total from 100 users, the group rate =$ 50 average per exposed user

Normalizing by exposed users (not just converters) helps you understand the impact on your entire user base. A feature that increases average order value among buyers but reduces conversion rate may decrease overall revenue per user.

Make Your Decision

Once the experiment is ready to review, you can choose to ‘End Analysis’. Use these guidelines:

When to Ship a Variant

Statistical significance achieved AND practical significance met (lift meets your minimum threshold)
Guardrail metrics remain stable (no significant negative impacts)
Sample size is adequate for your confidence requirements
Results align with your hypothesis and business objectives

When to Ship None

No statistical significance achieved after adequate test duration
Statistically significant but practically insignificant (lift too small to matter)
Negative impact on guardrail metrics outweighs primary metric gains
Results contradict your hypothesis significantly

When to Rerun or Iterate

Inconclusive results due to insufficient sample size
Mixed signals across different user segments
External factors contaminated the test period
Technical issues affected data collection

What to Watch Post-Rollout

Monitor guardrail metrics for 2-4 weeks after full rollout
Track long-term effects beyond your experiment window
Watch for novelty effects that may wear off
Document learnings for future experiments

Experiment Model Types

Sequential: Allows you to detect lift and conclude experiments quickly, but may fail to reach significance for very small lifts. When to use? For large changes (~10%+ lift) when you want to stop early once significance is reached.
Frequentist: Capable of detecting smaller lifts, but requires you to keep experiments for the full duration. When to use? For very small changes (~1% lift) when precision matters.

Experiment Metric Types

Primary Metrics: Main goals you’re trying to improve. These determine if the experiment succeeded. Examples: revenue, conversion rates, ARPU.
Guardrail Metrics: Important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
Secondary Metrics: Provide a deeper understanding of how users are interacting with your changes. Examples: time spent, number of pages visited, or specific user actions.

Frequently Asked Questions

If a user switches variants mid-experiment, how do we calculate the impact on metrics? We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant. If a user is part of multiple experiments, how do we calculate the impact of a single experiment? We consider the complete user’s behavior for every experiment that they are a part of. This still gives accurate results for a particular experiment, as the users have been randomly allocated. For what time duration do we associate the user being exposed to an experiment to impact metrics? Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.

Pricing FAQ

The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please contact us for more details.

Pricing Unit

Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally. How are MEUs different than MTUs? MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU; it’s only users who have tracked an exposure experiment event ($experiment_started) in the calendar month. Does it matter how many experiments a user is exposed to within the month? We’ve accounted for an MEU to be exposed to up to 30 experiments per month. If the average number of experiment exposure events per MEU is over 30, then the MEUs will be calculated as the total number of exposure events divided by 30. What happens if I go over my purchased MEU bucket? You can continue using Mixpanel Experiment Report, but you will be charged a higher rate for the overages.

Introduction

Quickstart

Data Ingestion

Data Structure

Reports & Analysis

Advanced Features

Data Export

Administration

Experiments

Experiments

Why Experiment?

Prerequisites

Experiment Process

Setup & Launch Your Experiment

Implementation for Experimentation

JavaScript Example

Python Example

iOS (Swift) Example

Android (Kotlin) Example

When to Track Exposure Events

Monitor Your Experiment

Interpret Your Results

Understanding Statistical Significance

Metric Types and Distributions

Example: E-commerce Checkout Experiment

Understanding Lift

Lift Calculation by Metric Type

Make Your Decision

When to Ship a Variant

When to Ship None

When to Rerun or Iterate

What to Watch Post-Rollout

Experiment Model Types

Experiment Metric Types

Frequently Asked Questions

Pricing FAQ

Pricing Unit

Build docs developers (and LLMs) love

Introduction

Quickstart

Data Ingestion

Data Structure

Reports & Analysis

Advanced Features

Data Export

Administration

​Experiments

​Why Experiment?

​Prerequisites

​Experiment Process

​Setup & Launch Your Experiment

​Implementation for Experimentation

​JavaScript Example

​Python Example

​iOS (Swift) Example

​Android (Kotlin) Example

​When to Track Exposure Events

​Monitor Your Experiment

​Interpret Your Results

​Understanding Statistical Significance

​Metric Types and Distributions

​Example: E-commerce Checkout Experiment

​Understanding Lift

​Lift Calculation by Metric Type

​Make Your Decision

​When to Ship a Variant

​When to Ship None

​When to Rerun or Iterate

​What to Watch Post-Rollout

​Experiment Model Types

​Experiment Metric Types

​Frequently Asked Questions

​Pricing FAQ

​Pricing Unit

Build docs developers (and LLMs) love

Experiments

Why Experiment?

Prerequisites

Experiment Process

Setup & Launch Your Experiment

Implementation for Experimentation

JavaScript Example

Python Example

iOS (Swift) Example

Android (Kotlin) Example

When to Track Exposure Events

Monitor Your Experiment

Interpret Your Results

Understanding Statistical Significance

Metric Types and Distributions

Example: E-commerce Checkout Experiment

Understanding Lift

Lift Calculation by Metric Type

Make Your Decision

When to Ship a Variant

When to Ship None

When to Rerun or Iterate

What to Watch Post-Rollout

Experiment Model Types

Experiment Metric Types

Frequently Asked Questions

Pricing FAQ

Pricing Unit