Skip to main content

Experiments

Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel is an ideal place to run experiments because all your product analytics data is already here.
The Experiment Report is a separately priced product add-on currently only offered to those on the Enterprise Plan. See our pricing page for more details.Customers who have not purchased the Experiment add-on will be able to create up to 3 experiments per organization. Please note that creation is irreversible.

Why Experiment?

Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel provides comprehensive insights into how changes affect your users’ journey.

Prerequisites

Before getting started with experiments:
  • Exposure Event Tracking: Implement your experimentation events (see Implementation below)
  • Baseline Metrics: Ensure that Mixpanel is already tracking your key metrics

Experiment Process

The experiment workflow follows these stages: PlanSetup & LaunchMonitorInterpret ResultsMake Decisions
  1. Plan: Define hypothesis, success metrics, and test parameters
  2. Setup & Launch: Configure experiment settings and begin exposure
  3. Monitor: Track experiment progress and data collection
  4. Interpret Results: Analyze statistical significance and lift
  5. Make Decisions: Choose whether to ship, iterate, or abandon changes

Setup & Launch Your Experiment

1

Select an Experiment

Click ‘New Experiment’ from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populated in the dropdown.
Only experiments tracked via exposure events ($experiment_started) can be analyzed in the experiment report.
2

Choose the Control Variant

Select the ‘Variant’ that represents your control. All your other variant(s) will be compared to the control, i.e., how much better they perform compared to the control variant.
3

Choose Success Metrics

Choose the primary metrics of success for the experiment. You can choose from either saved Mixpanel metrics or create a new metric leveraging the query panel. You can also add secondary metrics and guardrail metrics as required.
4

Select the Test Duration

Enter either the sample size (the number of users to be exposed to the experiment) or the minimum number of days you want the experiment to run. This will determine the test duration.
5

Confirm Default Configurations

Mixpanel has set default automatic configurations:
  1. Experiment Model type: Sequential
  2. Confidence Threshold: 95%
  3. Experiment Start Date: Date of the first user exposed to the experiment
Modify them as needed for your experiment.

Implementation for Experimentation

Mixpanel experiment analysis works based on exposure events. To use the experiment report, you must send your Exposure events in the following format: Event Name: $experiment_started Event Properties:
  • Experiment name - the name of the experiment to which the user has been exposed
  • Variant name - the name of the variant into which the user was bucketed

JavaScript Example

// Track exposure event when user sees the experiment
mixpanel.track('$experiment_started', {
  'Experiment name': 'New Checkout Flow',
  'Variant name': 'variant_a'
});

Python Example

from mixpanel import Mixpanel

mp = Mixpanel('YOUR_PROJECT_TOKEN')

# Track exposure event
mp.track('user_123', '$experiment_started', {
    'Experiment name': 'Pricing Page Test',
    'Variant name': 'control'
})

iOS (Swift) Example

// Track exposure event
mixpanel.track(event: "$experiment_started", properties: [
    "Experiment name": "New Onboarding",
    "Variant name": "test"
])

Android (Kotlin) Example

// Track exposure event
val props = JSONObject()
props.put("Experiment name", "Payment Flow")
props.put("Variant name", "variant_b")

mixpanel.track("\$experiment_started", props)

When to Track Exposure Events

Send exposure event only when a user is actually exposed, not at the start of a session.
  • An exposure event ONLY needs to be sent the first time a user is exposed to an experiment, as long as the user is always in the initial bucketed variant
  • If a user is part of multiple experiments, send a corresponding exposure event for each experiment
  • Send exposure details and not the assignment
Example: If you want to run an experiment on the payment page of a ride-sharing app, you only really care about users who open the app, book a ride, and then reach the payment page. The exposure event should ideally be implemented to track only once the payment page is reached.

Monitor Your Experiment

Once your experiment is running, track these key indicators:
  • Sample Size Progress: Track how many users have been exposed
  • Data Quality: Ensure exposure events are being tracked correctly
  • Guardrail Metrics: Watch for any negative impacts on important metrics
  • External Factors: Note any external events that might affect results

Interpret Your Results

The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:
  • p-value: Shows if the variants’ delta impact vs the control is statistically significant
  • lift: The variants’ delta impact on the metric vs control
Metric rows in the table are highlighted when any difference is calculated with high confidence:
  • Green: Positive differences, where the variant value is higher than the control
  • Red: Negative differences, where the variant value is lower than the control
  • Gray: Statistically insignificant results

Understanding Statistical Significance

Statistical significance (p-value) helps you determine whether your experiment results are likely to hold true for the full rollout.

Metric Types and Distributions

Mixpanel categorizes metrics into three types, each using different statistical distributions:
  1. Count Metrics (Total Events, Total Sessions): Use Poisson distribution
    • Examples: Total purchases, total page views, session count
  2. Rate Metrics (Conversion rates, Retention rates): Use Bernoulli distribution
    • Examples: Signup conversion rate, checkout completion rate, 7-day retention
  3. Value Metrics (Averages, Sums of properties): Use normal distribution approximation
    • Examples: Average order value, total revenue, average session duration

Example: E-commerce Checkout Experiment

Scenario: Testing a new checkout UI on an e-commerce site with 20 users (10 control, 10 treatment). Results:
  • Control group: 5 users converted (50% conversion rate), average cart size $60
  • Treatment group: 6 users converted (60% conversion rate), average cart size $67
For Conversion Rate (Rate Metric - Bernoulli Distribution):
  1. Group rates: Control = 0.5, Treatment = 0.6
  2. Variance calculation: Control = 0.5 × (1-0.5) = 0.25, Treatment = 0.6 × (1-0.6) = 0.24
  3. Standard error: Combined SE = √((0.25/10) + (0.24/10)) = 0.221
  4. Z-score: (0.6 - 0.5) / 0.221 = 0.45
  5. P-value: ~0.65 (not statistically significant)
This example shows why larger sample sizes are crucial—with only 10 users per group, even a 10-point difference in conversion rate isn’t statistically significant.

Understanding Lift

Lift is the percentage difference between the control and variant(s) metrics: Lift=variantgroupratecontrolgroupratecontrolgrouprateLift = \frac{variant \, group \, rate - control \, group \, rate}{control \, group \, rate}

Lift Calculation by Metric Type

Count Metrics (Total Events, Sessions):
  • Group Rate: Total count ÷ Number of users exposed
  • Example: If the treatment group has 150 total purchases from 100 exposed users, the group rate = 1.5 purchases per user
Rate Metrics (Conversion, Retention):
  • Group Rate: The actual rate (already normalized)
  • Example: If 25 out of 100 users convert, group rate = 0.25 (25% conversion rate)
Value Metrics (Averages, Sums):
  • Group Rate: Sum of property values ÷ Number of users exposed
  • Example: If the treatment group spent 5,000totalfrom100users,thegrouprate=5,000 total from 100 users, the group rate = 50 average per exposed user
Normalizing by exposed users (not just converters) helps you understand the impact on your entire user base. A feature that increases average order value among buyers but reduces conversion rate may decrease overall revenue per user.

Make Your Decision

Once the experiment is ready to review, you can choose to ‘End Analysis’. Use these guidelines:

When to Ship a Variant

  • Statistical significance achieved AND practical significance met (lift meets your minimum threshold)
  • Guardrail metrics remain stable (no significant negative impacts)
  • Sample size is adequate for your confidence requirements
  • Results align with your hypothesis and business objectives

When to Ship None

  • No statistical significance achieved after adequate test duration
  • Statistically significant but practically insignificant (lift too small to matter)
  • Negative impact on guardrail metrics outweighs primary metric gains
  • Results contradict your hypothesis significantly

When to Rerun or Iterate

  • Inconclusive results due to insufficient sample size
  • Mixed signals across different user segments
  • External factors contaminated the test period
  • Technical issues affected data collection

What to Watch Post-Rollout

  • Monitor guardrail metrics for 2-4 weeks after full rollout
  • Track long-term effects beyond your experiment window
  • Watch for novelty effects that may wear off
  • Document learnings for future experiments

Experiment Model Types

  • Sequential: Allows you to detect lift and conclude experiments quickly, but may fail to reach significance for very small lifts. When to use? For large changes (~10%+ lift) when you want to stop early once significance is reached.
  • Frequentist: Capable of detecting smaller lifts, but requires you to keep experiments for the full duration. When to use? For very small changes (~1% lift) when precision matters.

Experiment Metric Types

  • Primary Metrics: Main goals you’re trying to improve. These determine if the experiment succeeded. Examples: revenue, conversion rates, ARPU.
  • Guardrail Metrics: Important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
  • Secondary Metrics: Provide a deeper understanding of how users are interacting with your changes. Examples: time spent, number of pages visited, or specific user actions.

Frequently Asked Questions

If a user switches variants mid-experiment, how do we calculate the impact on metrics? We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant. If a user is part of multiple experiments, how do we calculate the impact of a single experiment? We consider the complete user’s behavior for every experiment that they are a part of. This still gives accurate results for a particular experiment, as the users have been randomly allocated. For what time duration do we associate the user being exposed to an experiment to impact metrics? Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.

Pricing FAQ

The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please contact us for more details.

Pricing Unit

Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally. How are MEUs different than MTUs? MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU; it’s only users who have tracked an exposure experiment event ($experiment_started) in the calendar month. Does it matter how many experiments a user is exposed to within the month? We’ve accounted for an MEU to be exposed to up to 30 experiments per month. If the average number of experiment exposure events per MEU is over 30, then the MEUs will be calculated as the total number of exposure events divided by 30. What happens if I go over my purchased MEU bucket? You can continue using Mixpanel Experiment Report, but you will be charged a higher rate for the overages.

Build docs developers (and LLMs) love