Experiments
Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel is an ideal place to run experiments because all your product analytics data is already here.Why Experiment?
Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel provides comprehensive insights into how changes affect your users’ journey.Prerequisites
Before getting started with experiments:- Exposure Event Tracking: Implement your experimentation events (see Implementation below)
- Baseline Metrics: Ensure that Mixpanel is already tracking your key metrics
Experiment Process
The experiment workflow follows these stages: Plan → Setup & Launch → Monitor → Interpret Results → Make Decisions- Plan: Define hypothesis, success metrics, and test parameters
- Setup & Launch: Configure experiment settings and begin exposure
- Monitor: Track experiment progress and data collection
- Interpret Results: Analyze statistical significance and lift
- Make Decisions: Choose whether to ship, iterate, or abandon changes
Setup & Launch Your Experiment
Select an Experiment
Click ‘New Experiment’ from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populated in the dropdown.
Choose the Control Variant
Select the ‘Variant’ that represents your control. All your other variant(s) will be compared to the control, i.e., how much better they perform compared to the control variant.
Choose Success Metrics
Choose the primary metrics of success for the experiment. You can choose from either saved Mixpanel metrics or create a new metric leveraging the query panel. You can also add secondary metrics and guardrail metrics as required.
Select the Test Duration
Enter either the sample size (the number of users to be exposed to the experiment) or the minimum number of days you want the experiment to run. This will determine the test duration.
Implementation for Experimentation
Mixpanel experiment analysis works based on exposure events. To use the experiment report, you must send your Exposure events in the following format: Event Name:$experiment_started
Event Properties:
Experiment name- the name of the experiment to which the user has been exposedVariant name- the name of the variant into which the user was bucketed
JavaScript Example
Python Example
iOS (Swift) Example
Android (Kotlin) Example
When to Track Exposure Events
- An exposure event ONLY needs to be sent the first time a user is exposed to an experiment, as long as the user is always in the initial bucketed variant
- If a user is part of multiple experiments, send a corresponding exposure event for each experiment
- Send exposure details and not the assignment
Monitor Your Experiment
Once your experiment is running, track these key indicators:- Sample Size Progress: Track how many users have been exposed
- Data Quality: Ensure exposure events are being tracked correctly
- Guardrail Metrics: Watch for any negative impacts on important metrics
- External Factors: Note any external events that might affect results
Interpret Your Results
The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:- p-value: Shows if the variants’ delta impact vs the control is statistically significant
- lift: The variants’ delta impact on the metric vs control
- Green: Positive differences, where the variant value is higher than the control
- Red: Negative differences, where the variant value is lower than the control
- Gray: Statistically insignificant results
Understanding Statistical Significance
Statistical significance (p-value) helps you determine whether your experiment results are likely to hold true for the full rollout.Metric Types and Distributions
Mixpanel categorizes metrics into three types, each using different statistical distributions:-
Count Metrics (Total Events, Total Sessions): Use Poisson distribution
- Examples: Total purchases, total page views, session count
-
Rate Metrics (Conversion rates, Retention rates): Use Bernoulli distribution
- Examples: Signup conversion rate, checkout completion rate, 7-day retention
-
Value Metrics (Averages, Sums of properties): Use normal distribution approximation
- Examples: Average order value, total revenue, average session duration
Example: E-commerce Checkout Experiment
Scenario: Testing a new checkout UI on an e-commerce site with 20 users (10 control, 10 treatment). Results:- Control group: 5 users converted (50% conversion rate), average cart size $60
- Treatment group: 6 users converted (60% conversion rate), average cart size $67
- Group rates: Control = 0.5, Treatment = 0.6
- Variance calculation: Control = 0.5 × (1-0.5) = 0.25, Treatment = 0.6 × (1-0.6) = 0.24
- Standard error: Combined SE = √((0.25/10) + (0.24/10)) = 0.221
- Z-score: (0.6 - 0.5) / 0.221 = 0.45
- P-value: ~0.65 (not statistically significant)
Understanding Lift
Lift is the percentage difference between the control and variant(s) metrics:Lift Calculation by Metric Type
Count Metrics (Total Events, Sessions):- Group Rate: Total count ÷ Number of users exposed
- Example: If the treatment group has 150 total purchases from 100 exposed users, the group rate = 1.5 purchases per user
- Group Rate: The actual rate (already normalized)
- Example: If 25 out of 100 users convert, group rate = 0.25 (25% conversion rate)
- Group Rate: Sum of property values ÷ Number of users exposed
- Example: If the treatment group spent 50 average per exposed user
Make Your Decision
Once the experiment is ready to review, you can choose to ‘End Analysis’. Use these guidelines:When to Ship a Variant
- Statistical significance achieved AND practical significance met (lift meets your minimum threshold)
- Guardrail metrics remain stable (no significant negative impacts)
- Sample size is adequate for your confidence requirements
- Results align with your hypothesis and business objectives
When to Ship None
- No statistical significance achieved after adequate test duration
- Statistically significant but practically insignificant (lift too small to matter)
- Negative impact on guardrail metrics outweighs primary metric gains
- Results contradict your hypothesis significantly
When to Rerun or Iterate
- Inconclusive results due to insufficient sample size
- Mixed signals across different user segments
- External factors contaminated the test period
- Technical issues affected data collection
What to Watch Post-Rollout
- Monitor guardrail metrics for 2-4 weeks after full rollout
- Track long-term effects beyond your experiment window
- Watch for novelty effects that may wear off
- Document learnings for future experiments
Experiment Model Types
- Sequential: Allows you to detect lift and conclude experiments quickly, but may fail to reach significance for very small lifts. When to use? For large changes (~10%+ lift) when you want to stop early once significance is reached.
- Frequentist: Capable of detecting smaller lifts, but requires you to keep experiments for the full duration. When to use? For very small changes (~1% lift) when precision matters.
Experiment Metric Types
- Primary Metrics: Main goals you’re trying to improve. These determine if the experiment succeeded. Examples: revenue, conversion rates, ARPU.
- Guardrail Metrics: Important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
- Secondary Metrics: Provide a deeper understanding of how users are interacting with your changes. Examples: time spent, number of pages visited, or specific user actions.
Frequently Asked Questions
If a user switches variants mid-experiment, how do we calculate the impact on metrics? We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant. If a user is part of multiple experiments, how do we calculate the impact of a single experiment? We consider the complete user’s behavior for every experiment that they are a part of. This still gives accurate results for a particular experiment, as the users have been randomly allocated. For what time duration do we associate the user being exposed to an experiment to impact metrics? Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.Pricing FAQ
Pricing Unit
Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally. How are MEUs different than MTUs? MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU; it’s only users who have tracked an exposure experiment event ($experiment_started) in the calendar month.
Does it matter how many experiments a user is exposed to within the month?
We’ve accounted for an MEU to be exposed to up to 30 experiments per month. If the average number of experiment exposure events per MEU is over 30, then the MEUs will be calculated as the total number of exposure events divided by 30.
What happens if I go over my purchased MEU bucket?
You can continue using Mixpanel Experiment Report, but you will be charged a higher rate for the overages.