diff --git a/cspell.json b/cspell.json
index ee5ad50384..c009f69096 100644
--- a/cspell.json
+++ b/cspell.json
@@ -167,6 +167,7 @@
"mpmetrics",
"mputils",
"msclkid",
+ "mSPRT",
"multischema",
"mxpnl",
"MYAPP",
@@ -228,6 +229,7 @@
"Signups",
"skus",
"splitlines",
+ "SPRT",
"Stackdriver",
"stddev",
"Steph",
diff --git a/pages/changelogs/2025-08-11-experimentation-reporting.mdx b/pages/changelogs/2025-08-11-experimentation-reporting.mdx
index 5f714e7b56..0531b676dd 100644
--- a/pages/changelogs/2025-08-11-experimentation-reporting.mdx
+++ b/pages/changelogs/2025-08-11-experimentation-reporting.mdx
@@ -25,6 +25,6 @@ With Mixpanel’s new Experimentation Reports, you can monitor every experiment
- **One place for all your experiments:** View and filter all active and past experiments, their results, and decisions made.
- **See the complete picture:** Link results with user behavior, cohorts, and session replays for the complete picture.
-**Test, learn, and innovate faster** — all in one solution for turning data into confident decisions. [Learn More →](https://docs.mixpanel.com/docs/reports/experiments)
+**Test, learn, and innovate faster** — all in one solution for turning data into confident decisions. [Learn More →](https://docs.mixpanel.com/docs/experiments)
NOTE: Experimentation Reporting is available as a paid add-on for customers on the Enterprise plan. Reach out to your account team to learn more.
diff --git a/pages/docs/_meta.tsx b/pages/docs/_meta.tsx
index 4bd993e209..d4de939167 100644
--- a/pages/docs/_meta.tsx
+++ b/pages/docs/_meta.tsx
@@ -43,6 +43,7 @@ export default {
},
reports: "Reports",
boards: "Boards",
+ experiments: "Experiments",
metric_tree: "Metric Trees",
users: "Users",
"session-replay": "Session Replay",
diff --git a/pages/docs/reports/experiments.mdx b/pages/docs/experiments.mdx
similarity index 55%
rename from pages/docs/reports/experiments.mdx
rename to pages/docs/experiments.mdx
index 5f6d2c981b..d179de381b 100644
--- a/pages/docs/reports/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -3,23 +3,53 @@ import { Callout } from 'nextra/components'
# Experiments: Measure the impact of a/b testing
- The Experiment Report is a separately priced product. It is currently only offered to those on the Enterprise Plan. See our [pricing page](https://mixpanel.com/pricing/) for more details.
+ The Experiment Report is a separately priced product add-on. It is currently only offered to those on the Enterprise Plan. See our [pricing page](https://mixpanel.com/pricing/) for more details.
-## Overview
+## Why Experiment?
+
+Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel is an ideal place to run experiments because all your product analytics data is already here, giving you comprehensive insights into how changes affect your entire user journey.
+
+## Prerequisites
+
+Before getting started with experiments:
+
+- **Exposure Event Tracking**: [Implement](#implementation-for-experimentation) your experimentation events
+- **Baseline Metrics**: Have your key success metrics already measured in Mixpanel
+
+## Overview & Workflow

-The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
+The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
+
+### Experiment Process
+
+**Plan** → **Setup & Launch** → **Monitor** → **Interpret Results** → **Make Decisions**
+
+1. **Plan**: Define hypothesis, success metrics, and test parameters
+2. **Setup & Launch**: Configure experiment settings and begin exposure
+3. **Monitor**: Track experiment progress and data collection
+4. **Interpret Results**: Analyze statistical significance and lift
+5. **Make Decisions**: Choose whether to ship, iterate, or abandon changes
+
+## Plan Your Experiment
+
+Before creating an experiment report, ensure you have:
-## Building an Experiment Report
+- A clear hypothesis about what change will improve which metric
+- Defined primary success metrics (and secondary/guardrail metrics)
+- Estimated sample size and test duration requirements
+- Proper exposure event tracking implemented
+
+## Setup & Launch Your Experiment
### Step 1: Select an Experiment
Click 'New Experiment' from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populated in the dropdown. To analyze experiments that began before 30 days, please hard-code the experiment name
-Only experiments tracked via exposure events, i.e, $experiment_started`, can be analyzed in the experiment report. Read more on how to track experiments [here](/docs/reports/experiments#adding-experiments-to-an-implementation).
+Only experiments tracked via exposure events, i.e, $experiment_started`, can be analyzed in the experiment report. Read more on how to track experiments [here](#adding-experiments-to-an-implementation).
### Step 2: Choose the ‘Control’ Variant
@@ -41,7 +71,16 @@ Mixpanel has set default automatic configurations, seen below. If required, plea
2. **Confidence Threshold**: 95%
3. **Experiment Start Date**: Date of the first user exposed to the experiment
-## Reading Experiment Report Results
+## Monitor Your Experiment
+
+Once your experiment is running, you can track its progress in the Experiments dashboard. Monitor key indicators:
+
+- **Sample Size Progress**: Track how many users have been exposed
+- **Data Quality**: Ensure exposure events are being tracked correctly
+- **Guardrail Metrics**: Watch for any negative impacts on important metrics
+- **External Factors**: Note any external events that might affect results
+
+## Interpret Your Results
The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:
@@ -56,11 +95,51 @@ Metric rows in the table are highlighted when any difference is calculated with
### How do you read statistical significance?
-The main reason you look at statistical significance (p-value) is to get confidence on what it means for the larger rollout.
+Statistical significance (p-value) helps you determine whether your experiment results are likely to hold true for the full rollout, giving you confidence in your decisions.

-Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
+#### Statistical Significance Calculation
+
+Mixpanel uses Frequentist statistical methods to compute p-values and confidence intervals. The specific approach depends on your metric type and experiment model.
+
+**Metric Types and Their Distributions:**
+
+Mixpanel categorizes metrics into three types, each using different statistical distributions:
+
+1. **Count Metrics** (Total Events, Total Sessions): Use **Poisson distribution**
+ - Examples: Total purchases, total page views, session count
+ - Variance equals the mean (characteristic of Poisson distributions)
+
+2. **Rate Metrics** (Conversion rates, Retention rates): Use **Bernoulli distribution**
+ - Examples: Signup conversion rate, checkout completion rate, 7-day retention
+ - Models binary outcomes (did/didn't convert) across your user base
+
+3. **Value Metrics** (Averages, Sums of properties): Use **normal distribution approximation**
+ - Examples: Average order value, total revenue, average session duration
+ - Calculates variance using sample statistics
+
+**Statistical Calculation Process:**
+
+For all metric types, we follow the same general process:
+
+1. **Calculate group rates** for control and treatment
+2. **Estimate variance** using the appropriate distribution
+3. **Compute standard error** from variance and sample size
+4. **Calculate Z-score** measuring how many standard errors apart the groups are
+5. **Derive p-value** from Z-score using normal distribution
+
+**Statistical Foundation:**
+Our calculations assume normal distributions for the sampling distributions of our metrics. While individual data points may not be normally distributed, the Central Limit Theorem tells us that with sufficient sample sizes, the sampling distributions of means and proportions will approximate normal distributions, making our statistical methods valid.
+
+**For Sequential Testing:**
+- Uses continuous monitoring with adjusted significance thresholds with [mSPRT method](https://arxiv.org/pdf/1905.10493)
+- Allows for early stopping when significance is reached
+- More conservative calculations to account for multiple testing
+
+**For Frequentist Testing:**
+- Uses traditional hypothesis testing with fixed sample sizes
+- Formula: Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
In the above image for example, max p=0.025 [(1-0.95)/2]
@@ -69,28 +148,60 @@ So, if an experiment's results show
- p ≤ 0.025: results are statistically significant for this metric, i.e, you can be 95% confident in the lift seen if the change is rolled out to all users.
- p > 0.025: results are not statistically significant for this metric, i.e, you cannot be very confident in the results if the change is rolled out broadly.
+#### Example: E-commerce Checkout Experiment
+
+To illustrate how these calculations work in practice, let's walk through a concrete example.
+
+**Scenario:** Testing a new checkout UI on an e-commerce site with 20 users (10 control, 10 treatment).
+
+**Results:**
+- **Control group:** 5 users converted (50% conversion rate), average cart size $60
+- **Treatment group:** 6 users converted (60% conversion rate), average cart size $67
+
+**For Conversion Rate (Rate Metric - Bernoulli Distribution):**
+
+1. **Group rates:** Control = 0.5, Treatment = 0.6
+2. **Variance calculation:** Control = 0.5 × (1-0.5) = 0.25, Treatment = 0.6 × (1-0.6) = 0.24
+3. **Standard error:** Combined SE = √((0.25/10) + (0.24/10)) = 0.221
+4. **Z-score:** (0.6 - 0.5) / 0.221 = 0.45
+5. **P-value:** ~0.65 (not statistically significant)
+
+**For Average Cart Size (Value Metric - Normal Distribution):**
+
+1. **Group means:** Control = $60, Treatment = $67
+2. **Variance calculation:** Uses sample variance of cart values in each group
+3. **Standard error:** Calculated from combined variance and sample sizes
+4. **Z-score and p-value:** Computed using the same Z-test framework
+
+This example shows why larger sample sizes are crucial—with only 10 users per group, even a 10-point difference in conversion rate isn't statistically significant.
+
### How do you read lift?
Lift is the percentage difference between the control and variant(s) metrics.
$Lift= { (variant \,group\,rate - control \,group\,rate) \over (control \,group\,rate)}$
-Lift, mean, and variance are calculated differently based on the type of metric being analyzed. We categorize metrics into 3 types:
+Lift, mean, and variance are calculated differently based on the type of metric being analyzed:
-- **Numeric** - any metrics that involve numeric property math (sum, average, etc)
-- **Binomial** - any metric that has a true or false outcome (unique users, funnel conversions, retention)
-- **Rate** - any metric that can be conceptualized as a rate (funnel conversion rate, total events/experiment, etc)
+**Count Metrics (Total Events, Sessions):**
+- **Group Rate:** Total count ÷ Number of users exposed
+- **Variance:** Equal to the mean (Poisson distribution property)
+- **Example:** If treatment group has 150 total purchases from 100 exposed users, group rate = 1.5 purchases per user
-The ‘group rate’ is calculated differently depending on the type of metric.
+**Rate Metrics (Conversion, Retention):**
+- **Group Rate:** The actual rate (already normalized)
+- **Variance:** Calculated using Bernoulli distribution: p × (1-p)
+- **Example:** If 25 out of 100 users convert, group rate = 0.25 (25% conversion rate)
-- For numeric & binomial metrics:
+**Value Metrics (Averages, Sums):**
+- **Group Rate:** Sum of property values ÷ Number of users exposed
+- **Variance:** Calculated from the distribution of individual property values
+- **Example:** If treatment group spent $5,000 total from 100 users, group rate = $50 average per exposed user
- $Group\,Rate= { (\# \,Metric\,absolute\,value) \over (\# of\,users\,exposed)}$
-
- NOTE: Normalizing the rate based on the number of users exposed helps understand the possible impact on every single user exposed to the experiment
-
-- For rate metrics: the group rate is the same as the metric for the users in the group. Example: if calculating a funnel conversion rate, then the group rate is the overall conversion rate of the funnel for users in the group.
-
- NOTE: Conversion rates are normalized as is, hence no further normalization is done
+**Why This Matters:**
+Normalizing by exposed users (not just converters) helps you understand the impact on your entire user base. A feature that increases average order value among buyers but reduces conversion rate might actually decrease overall revenue per user.
+
+**Custom Formula Metrics:**
+For complex metrics using formulas like `Revenue per User = Total Revenue ÷ Unique Users`, Mixpanel uses propagation of uncertainty to estimate variance. This combines the variances of the component metrics (Total Revenue and Unique Users) to calculate the overall metric's statistical significance. The system assumes metrics in formulas are uncorrelated for these calculations.
### When do we say the Experiment is ready to review?
Once the ‘Test Duration’ setup during configuration is complete, we show a banner that says “Experiment is ready to review”.
@@ -100,7 +211,7 @@ Once the ‘Test Duration’ setup during configuration is complete, we show a b
1. Sample size to be exposed
2. Number of days you’d like to run the experiment
-NOTE: If you are using a ‘sequential’ testing experiment model type, you can always peek at the results sooner. Learn more about what sequential testing is [here](/docs/reports/experiments#experiment-model-types)
+NOTE: If you are using a 'sequential' testing experiment model type, you can always peek at the results sooner. Learn more about what sequential testing is [here](#experiment-model-types)
### Diagnosing experiments further in regular Mixpanel reports
Click 'Analyze' on a metric to dive deeper into the results. This will open a normal Mixpanel insights report for the time range being analyzed with the experiment breakdown applied. This allows you to view users, view replays, or apply additional breakdowns to further analyze the results.
@@ -116,22 +227,8 @@ The Experiment report behavior is powered by [borrowed properties](/docs/feature
For every user event, we identify if the event is performed after being exposed to an experiment. If it were, then we would borrow the variant details from the tracked `$experiment_started` to attribute the event to the proper variant.
-### FAQs
-1. If a user switches variants mid-experiment, how do we calculate the impact on metrics?
-
- We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant.
-
-2. If a user is part of multiple experiments, how do we calculate the impact of a single experiment?
-
- We consider the complete user’s behavior for every experiment that they are a part of.
-
- We believe this will still give accurate results for a particular experiment, as the users have been randomly allocated. So there should be enough similar users, ie, part of multiple experiments, across both control and variants for a particular experiment.
-
-3. For what time duration do we associate the user being exposed to an experiment to impact metrics?
-
- Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.
-## Adding Experiments to an Implementation
+### Implementation for Experimentation
Mixpanel experiment analysis work based on exposure events. To use the experiment report, you must send your Exposure events in the following format:
@@ -161,15 +258,28 @@ You can specify the event and property that should be used as the exposure event
For example, you begin an experiment on 1st Aug, and 1M users are ‘assigned’ to the control and variant. You do not want to send an ‘exposure’ event for all these users right away, as they have only been assigned to the experiment. It’s possible that some user gets exposed on 4th Aug and some on 8th Aug. You would want to track $experiment_started at the exposure for accurate analysis.
-## Experiment Pricing
+### FAQs
+1. If a user switches variants mid-experiment, how do we calculate the impact on metrics?
+
+ We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant.
+
+2. If a user is part of multiple experiments, how do we calculate the impact of a single experiment?
+
+ We consider the complete user’s behavior for every experiment that they are a part of.
+
+ We believe this will still give accurate results for a particular experiment, as the users have been randomly allocated. So there should be enough similar users, ie, part of multiple experiments, across both control and variants for a particular experiment.
+
+3. For what time duration do we associate the user being exposed to an experiment to impact metrics?
+
+ Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.
+
+## Experimentation Pricing FAQ
The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please [contact us](https://mixpanel.com/contact-us/sales/) for more details.
### Pricing Unit
-Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally.
-
-### FAQ
+Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally.
#### How are MEUs different than MTUs (Monthly Tracked Users)?
MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU; it’s only users who have tracked an exposure experiment event (ie, `$experiment_started`) in the calendar month.
@@ -212,12 +322,39 @@ You can see your experiment MEU usage by going to Organization settings > Plan D
- **Guardrail Metrics:** These are other important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
- **Secondary Metrics:** These provide a deeper understanding of how users are interacting with your changes, i.e, help to understand the "why" behind changes in the primary metric. Examples: time spent, number of pages visited, or specific user actions.
-### Post Experiment Analysis Decision
-Once the experiment is ready to review, you can choose to 'End Analysis'. Once complete, you can log a decision, visible to all users, based on the experiment outcome:
+## Make Your Decision
+
+Once the experiment is ready to review, you can choose to 'End Analysis'. Use these guidelines to make informed decisions:
+
+### When to Ship a Variant
+- **Statistical significance achieved** AND **practical significance met** (lift meets your minimum threshold)
+- **Guardrail metrics remain stable** (no significant negative impacts)
+- **Sample size is adequate** for your confidence requirements
+- **Results align with your hypothesis** and business objectives
+
+### When to Ship None
+- **No statistical significance** achieved after adequate test duration
+- **Statistically significant but practically insignificant** (lift too small to matter)
+- **Negative impact on guardrail metrics** outweighs primary metric gains
+- **Results contradict** your hypothesis significantly
+
+### When to Rerun or Iterate
+- **Inconclusive results** due to insufficient sample size
+- **Mixed signals** across different user segments
+- **External factors** contaminated the test period
+- **Technical issues** affected data collection
+
+### What to Watch Post-Rollout
+- **Monitor guardrail metrics** for 2-4 weeks after full rollout
+- **Track long-term effects** beyond your experiment window
+- **Watch for novelty effects** that may wear off
+- **Document learnings** for future experiments
+
+### Decision Options in Mixpanel
-- Ship Variant (any of the variants): You had a statistically significant result. You have made a decision to ship a variant to all users. NOTE: Shipping variant here is just a log; it does not actually trigger rolling out the feature flag unless you are using Mixpanel feature flags **(in beta today)**.
-- Ship None: You may not have had any statistically significant results, or even if you have statistically significant results, the lift is not sufficient to warrant a change in user experience. You decide not to ship the change.
-- Defer Decision: You may have a direction you want to go, but need to sync with other stakeholders before confirming the decision. This is an example where you might defer decision, and come back at a later date and log the final decision.
+- **Ship Variant (any of the variants)**: You had a statistically significant result. You have made a decision to ship a variant to all users. NOTE: Shipping variant here is just a log; it does not actually trigger rolling out the feature flag unless you are using Mixpanel feature flags **(in beta today)**.
+- **Ship None**: You may not have had any statistically significant results, or even if you have statistically significant results, the lift is not sufficient to warrant a change in user experience. You decide not to ship the change.
+- **Defer Decision**: You may have a direction you want to go, but need to sync with other stakeholders before confirming the decision. This is an example where you might defer decision, and come back at a later date and log the final decision.
### Experiment Management
You can manage all your experiments via the Experiments Home tab. You can customize which columns you’d like to see.
diff --git a/pages/docs/reports/_meta.ts b/pages/docs/reports/_meta.ts
index a48941a2fa..764ba31495 100644
--- a/pages/docs/reports/_meta.ts
+++ b/pages/docs/reports/_meta.ts
@@ -3,6 +3,5 @@ export default {
"funnels": "Funnels",
"retention": "Retention",
"flows": "Flows",
- "experiments": "Experiments",
"apps": "Apps"
}
diff --git a/pages/docs/reports/apps/experiments.mdx b/pages/docs/reports/apps/experiments.mdx
index d961920e53..51d6b5fd46 100644
--- a/pages/docs/reports/apps/experiments.mdx
+++ b/pages/docs/reports/apps/experiments.mdx
@@ -3,7 +3,7 @@ import { Callout } from 'nextra/components'
# Experiments (Deprecating Soon)
- This app will be deprecated on Nov 1, 2025. To analyze Experiments in Mixpanel, please use our new [Experiments Report](/docs/reports/experiments)
+ This app will be deprecated on Nov 1, 2025. To analyze Experiments in Mixpanel, please use our new [Experiments Report](/docs/experiments)
## Overview
diff --git a/redirects/local.txt b/redirects/local.txt
index 8597aa4d63..62aedcc290 100644
--- a/redirects/local.txt
+++ b/redirects/local.txt
@@ -22,7 +22,8 @@
/docs/analysis/advanced/custom-events /docs/features/custom-events
/docs/analysis/advanced/custom-properties /docs/features/custom-properties
/docs/analysis/advanced/embeds /docs/features/embeds
-/docs/analysis/advanced/experiments /docs/reports/apps/experiments
+/docs/analysis/advanced/experiments /docs/experiments
+/docs/reports/experiments /docs/experiments
/docs/analysis/advanced/group-analytics /docs/data-structure/advanced/group-analytics
/docs/analysis/advanced/impact /docs/reports/apps/impact
/docs/analysis/advanced/other-advanced-features /docs/features/advanced