From a8fd9a33f46d672739009cee7355df59807316f0 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 13:28:12 -0400
Subject: [PATCH 01/12] update experimentation docs
- brought to analysis high level, and added seperate pages for guidance
- added best practices section
- added more info on stat signifigance
- added prequiesties to get started
---
.../2025-08-11-experimentation-reporting.mdx | 2 +-
pages/docs/_meta.tsx | 1 +
pages/docs/{reports => }/experiments.mdx | 108 +++++++++++++--
pages/docs/experiments/_meta.ts | 3 +
pages/docs/experiments/best-practices.mdx | 128 ++++++++++++++++++
pages/docs/reports/_meta.ts | 1 -
pages/docs/reports/apps/experiments.mdx | 2 +-
redirects/local.txt | 3 +-
8 files changed, 229 insertions(+), 19 deletions(-)
rename pages/docs/{reports => }/experiments.mdx (72%)
create mode 100644 pages/docs/experiments/_meta.ts
create mode 100644 pages/docs/experiments/best-practices.mdx
diff --git a/pages/changelogs/2025-08-11-experimentation-reporting.mdx b/pages/changelogs/2025-08-11-experimentation-reporting.mdx
index 5f714e7b56..0531b676dd 100644
--- a/pages/changelogs/2025-08-11-experimentation-reporting.mdx
+++ b/pages/changelogs/2025-08-11-experimentation-reporting.mdx
@@ -25,6 +25,6 @@ With Mixpanel’s new Experimentation Reports, you can monitor every experiment
- **One place for all your experiments:** View and filter all active and past experiments, their results, and decisions made.
- **See the complete picture:** Link results with user behavior, cohorts, and session replays for the complete picture.
-**Test, learn, and innovate faster** — all in one solution for turning data into confident decisions. [Learn More →](https://docs.mixpanel.com/docs/reports/experiments)
+**Test, learn, and innovate faster** — all in one solution for turning data into confident decisions. [Learn More →](https://docs.mixpanel.com/docs/experiments)
NOTE: Experimentation Reporting is available as a paid add-on for customers on the Enterprise plan. Reach out to your account team to learn more.
diff --git a/pages/docs/_meta.tsx b/pages/docs/_meta.tsx
index 4bd993e209..d4de939167 100644
--- a/pages/docs/_meta.tsx
+++ b/pages/docs/_meta.tsx
@@ -43,6 +43,7 @@ export default {
},
reports: "Reports",
boards: "Boards",
+ experiments: "Experiments",
metric_tree: "Metric Trees",
users: "Users",
"session-replay": "Session Replay",
diff --git a/pages/docs/reports/experiments.mdx b/pages/docs/experiments.mdx
similarity index 72%
rename from pages/docs/reports/experiments.mdx
rename to pages/docs/experiments.mdx
index 5f6d2c981b..8451af5d8f 100644
--- a/pages/docs/reports/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -6,20 +6,52 @@ import { Callout } from 'nextra/components'
The Experiment Report is a separately priced product. It is currently only offered to those on the Enterprise Plan. See our [pricing page](https://mixpanel.com/pricing/) for more details.
+## Why Experiment?
+
+Experimentation helps you make data-driven product decisions by measuring the real impact of changes on user behavior. Mixpanel is an ideal place to run experiments because all your product analytics data is already here, giving you comprehensive insights into how changes affect your entire user journey.
+
+## Prerequisites
+
+Before getting started with experiments:
+
+- **Enterprise Plan**: Experimentation is only available on Enterprise plans
+- **Exposure Event Tracking**: You must implement `$experiment_started` event tracking
+- **Baseline Metrics**: Have your key success metrics already defined in Mixpanel
+- **Permissions**: Ensure you have the appropriate project permissions
+
+## Experiment Workflow
+
+**Plan** → **Setup & Launch** → **Monitor** → **Interpret Results** → **Make Decisions**
+
+1. **Plan**: Define hypothesis, success metrics, and test parameters
+2. **Setup & Launch**: Configure experiment settings and begin exposure
+3. **Monitor**: Track experiment progress and data collection
+4. **Interpret Results**: Analyze statistical significance and lift
+5. **Make Decisions**: Choose whether to ship, iterate, or abandon changes
+
## Overview

-The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
+The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
+
+## Plan Your Experiment
-## Building an Experiment Report
+Before creating an experiment report, ensure you have:
+
+- A clear hypothesis about what change will improve which metric
+- Defined primary success metrics (and secondary/guardrail metrics)
+- Estimated sample size and test duration requirements
+- Proper exposure event tracking implemented
+
+## Setup & Launch Your Experiment
### Step 1: Select an Experiment
Click 'New Experiment' from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populated in the dropdown. To analyze experiments that began before 30 days, please hard-code the experiment name
-Only experiments tracked via exposure events, i.e, $experiment_started`, can be analyzed in the experiment report. Read more on how to track experiments [here](/docs/reports/experiments#adding-experiments-to-an-implementation).
+Only experiments tracked via exposure events, i.e, $experiment_started`, can be analyzed in the experiment report. Read more on how to track experiments [here](#adding-experiments-to-an-implementation).
### Step 2: Choose the ‘Control’ Variant
@@ -41,7 +73,16 @@ Mixpanel has set default automatic configurations, seen below. If required, plea
2. **Confidence Threshold**: 95%
3. **Experiment Start Date**: Date of the first user exposed to the experiment
-## Reading Experiment Report Results
+## Monitor Your Experiment
+
+Once your experiment is running, you can track its progress in the Experiments dashboard. Monitor key indicators:
+
+- **Sample Size Progress**: Track how many users have been exposed
+- **Data Quality**: Ensure exposure events are being tracked correctly
+- **Guardrail Metrics**: Watch for any negative impacts on important metrics
+- **External Factors**: Note any external events that might affect results
+
+## Interpret Your Results
The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:
@@ -60,7 +101,19 @@ The main reason you look at statistical significance (p-value) is to get confide

-Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
+#### Statistical Significance Calculation
+
+Mixpanel calculates statistical significance using different methods based on your experiment model:
+
+**For Sequential Testing:**
+- Uses continuous monitoring with adjusted significance thresholds
+- Allows for early stopping when significance is reached
+- Accounts for multiple testing through sequential boundaries
+
+**For Frequentist Testing:**
+- Uses traditional hypothesis testing with fixed sample sizes
+- Calculates p-values using standard statistical tests (t-tests for continuous metrics, chi-square for categorical)
+- Formula: Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
In the above image for example, max p=0.025 [(1-0.95)/2]
@@ -100,7 +153,7 @@ Once the ‘Test Duration’ setup during configuration is complete, we show a b
1. Sample size to be exposed
2. Number of days you’d like to run the experiment
-NOTE: If you are using a ‘sequential’ testing experiment model type, you can always peek at the results sooner. Learn more about what sequential testing is [here](/docs/reports/experiments#experiment-model-types)
+NOTE: If you are using a 'sequential' testing experiment model type, you can always peek at the results sooner. Learn more about what sequential testing is [here](#experiment-model-types)
### Diagnosing experiments further in regular Mixpanel reports
Click 'Analyze' on a metric to dive deeper into the results. This will open a normal Mixpanel insights report for the time range being analyzed with the experiment breakdown applied. This allows you to view users, view replays, or apply additional breakdowns to further analyze the results.
@@ -161,15 +214,13 @@ You can specify the event and property that should be used as the exposure event
For example, you begin an experiment on 1st Aug, and 1M users are ‘assigned’ to the control and variant. You do not want to send an ‘exposure’ event for all these users right away, as they have only been assigned to the experiment. It’s possible that some user gets exposed on 4th Aug and some on 8th Aug. You would want to track $experiment_started at the exposure for accurate analysis.
-## Experiment Pricing
+## Billing FAQ
The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please [contact us](https://mixpanel.com/contact-us/sales/) for more details.
### Pricing Unit
-Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally.
-
-### FAQ
+Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally.
#### How are MEUs different than MTUs (Monthly Tracked Users)?
MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU; it’s only users who have tracked an exposure experiment event (ie, `$experiment_started`) in the calendar month.
@@ -212,12 +263,39 @@ You can see your experiment MEU usage by going to Organization settings > Plan D
- **Guardrail Metrics:** These are other important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
- **Secondary Metrics:** These provide a deeper understanding of how users are interacting with your changes, i.e, help to understand the "why" behind changes in the primary metric. Examples: time spent, number of pages visited, or specific user actions.
-### Post Experiment Analysis Decision
-Once the experiment is ready to review, you can choose to 'End Analysis'. Once complete, you can log a decision, visible to all users, based on the experiment outcome:
+## Make Your Decision
+
+Once the experiment is ready to review, you can choose to 'End Analysis'. Use these guidelines to make informed decisions:
+
+### When to Ship a Variant
+- **Statistical significance achieved** AND **practical significance met** (lift meets your minimum threshold)
+- **Guardrail metrics remain stable** (no significant negative impacts)
+- **Sample size is adequate** for your confidence requirements
+- **Results align with your hypothesis** and business objectives
+
+### When to Ship None
+- **No statistical significance** achieved after adequate test duration
+- **Statistically significant but practically insignificant** (lift too small to matter)
+- **Negative impact on guardrail metrics** outweighs primary metric gains
+- **Results contradict** your hypothesis significantly
+
+### When to Rerun or Iterate
+- **Inconclusive results** due to insufficient sample size
+- **Mixed signals** across different user segments
+- **External factors** contaminated the test period
+- **Technical issues** affected data collection
+
+### What to Watch Post-Rollout
+- **Monitor guardrail metrics** for 2-4 weeks after full rollout
+- **Track long-term effects** beyond your experiment window
+- **Watch for novelty effects** that may wear off
+- **Document learnings** for future experiments
+
+### Decision Options in Mixpanel
-- Ship Variant (any of the variants): You had a statistically significant result. You have made a decision to ship a variant to all users. NOTE: Shipping variant here is just a log; it does not actually trigger rolling out the feature flag unless you are using Mixpanel feature flags **(in beta today)**.
-- Ship None: You may not have had any statistically significant results, or even if you have statistically significant results, the lift is not sufficient to warrant a change in user experience. You decide not to ship the change.
-- Defer Decision: You may have a direction you want to go, but need to sync with other stakeholders before confirming the decision. This is an example where you might defer decision, and come back at a later date and log the final decision.
+- **Ship Variant (any of the variants)**: You had a statistically significant result. You have made a decision to ship a variant to all users. NOTE: Shipping variant here is just a log; it does not actually trigger rolling out the feature flag unless you are using Mixpanel feature flags **(in beta today)**.
+- **Ship None**: You may not have had any statistically significant results, or even if you have statistically significant results, the lift is not sufficient to warrant a change in user experience. You decide not to ship the change.
+- **Defer Decision**: You may have a direction you want to go, but need to sync with other stakeholders before confirming the decision. This is an example where you might defer decision, and come back at a later date and log the final decision.
### Experiment Management
You can manage all your experiments via the Experiments Home tab. You can customize which columns you’d like to see.
diff --git a/pages/docs/experiments/_meta.ts b/pages/docs/experiments/_meta.ts
new file mode 100644
index 0000000000..7639fc6e4a
--- /dev/null
+++ b/pages/docs/experiments/_meta.ts
@@ -0,0 +1,3 @@
+export default {
+ "best-practices": "Best Practices"
+}
diff --git a/pages/docs/experiments/best-practices.mdx b/pages/docs/experiments/best-practices.mdx
new file mode 100644
index 0000000000..7b3ff1f0e3
--- /dev/null
+++ b/pages/docs/experiments/best-practices.mdx
@@ -0,0 +1,128 @@
+import { Callout } from 'nextra/components'
+
+# Experimentation Best Practices
+
+## Hypothesis Development
+
+### Craft Strong Hypotheses
+- **Structure**: "If [change], then [outcome], because [reasoning]"
+- **Specific**: Define exactly what you're changing and what you expect to happen
+- **Measurable**: Tie to concrete metrics you can track
+- **Time-bound**: Set clear expectations for when effects should appear
+
+### Example Good Hypothesis
+"If we add social proof badges to product pages, then conversion rate will increase by 5%, because users trust products that others have purchased."
+
+## Sample Size Planning
+
+### Determine Adequate Sample Size
+- **Use power analysis**: Calculate required sample size before starting
+- **Consider your baseline**: Lower baseline rates need larger samples
+- **Factor in expected lift**: Smaller expected changes need more users
+- **Account for segments**: Plan for subgroup analysis needs
+
+### General Guidelines
+- **Minimum Detectable Effect**: Aim for changes of at least 2-5%
+- **Statistical Power**: Target 80% power (ability to detect true effects)
+- **Significance Level**: Typically use 95% confidence (α = 0.05)
+
+## Metric Selection
+
+### Primary Metrics
+- **Choose 1-2 primary metrics maximum** to avoid multiple testing issues
+- **Select metrics that directly measure your hypothesis**
+- **Ensure metrics are sensitive to your changes** (will move within test timeframe)
+
+### Guardrail Metrics
+- **Monitor key business metrics** (revenue, retention, satisfaction)
+- **Track user experience indicators** (page load time, error rates)
+- **Watch for unintended consequences** in related product areas
+
+### Secondary Metrics
+- **Help explain the "why" behind primary metric changes**
+- **Provide additional context for decision making**
+- **Explore user behavior patterns**
+
+## Experiment Design
+
+### Randomization Best Practices
+- **Use proper randomization units** (typically users, not sessions)
+- **Ensure random assignment is consistent** across user sessions
+- **Account for network effects** when users can influence each other
+- **Consider stratification** for important user segments
+
+### Control Group Management
+- **Always include a proper control group** (status quo)
+- **Keep control groups large enough** for reliable comparisons
+- **Avoid making changes to control** during the experiment
+
+## Avoiding Common Pitfalls
+
+### Statistical Issues
+- **Don't peek at results repeatedly** without adjusting significance levels
+- **Avoid stopping experiments early** unless using sequential testing
+- **Be aware of multiple testing problems** when analyzing many metrics
+- **Don't cherry-pick time periods** for analysis
+
+### Implementation Issues
+- **Validate exposure event tracking** before launching
+- **Test your experiment setup** with a small percentage first
+- **Monitor for technical issues** that could bias results
+- **Ensure consistent user experience** across variant groups
+
+### Business Context
+- **Consider external factors** (holidays, marketing campaigns, seasonality)
+- **Account for learning effects** (users adapting to changes over time)
+- **Plan for network effects** in social or marketplace products
+- **Think about long-term vs. short-term impacts**
+
+## Running Experiments at Scale
+
+### Experiment Pipeline
+- **Maintain a roadmap** of planned experiments
+- **Prioritize based on potential impact** and ease of implementation
+- **Allow adequate time** between related experiments
+- **Document learnings** for organizational knowledge
+
+### Resource Management
+- **Plan engineering resources** for implementation and monitoring
+- **Coordinate with marketing teams** to avoid conflicting campaigns
+- **Consider user fatigue** from too many simultaneous experiments
+- **Balance learning goals** with product development velocity
+
+## Statistical Considerations
+
+### Sequential vs. Frequentist Testing
+- **Sequential Testing**: Good for detecting large effects quickly, allows early stopping
+- **Frequentist Testing**: Better for small effect detection, requires full test duration
+- **Choose based on your goals**: Quick decisions vs. precise measurements
+
+### Handling Multiple Variants
+- **Limit the number of variants** to maintain statistical power
+- **Consider pairwise comparisons** vs. overall ANOVA
+- **Adjust significance levels** when making multiple comparisons
+- **Plan your analysis approach** before starting the test
+
+## Advanced Topics
+
+### Segmentation Analysis
+- **Plan key segments in advance** (new vs. returning users, etc.)
+- **Use interaction effects** to understand segment differences
+- **Be cautious about post-hoc segmentation** (can lead to false discoveries)
+- **Consider segment size requirements** for reliable results
+
+### Long-term Effects
+- **Plan for post-experiment monitoring** to catch delayed effects
+- **Consider novelty effects** that may wear off over time
+- **Think about user learning curves** for complex features
+- **Monitor competitive responses** that might influence results
+
+
+Remember that experimentation is both an art and a science. While these guidelines provide a strong foundation, always consider your specific product context and user base when designing experiments.
+
+
+## Additional Resources
+
+- [Sample Size Calculators](https://mixpanel.com/tools/sample-size-calculator)
+- [A/B Testing Significance Calculator](https://mixpanel.com/tools/ab-test-calculator)
+- [Mixpanel's Guide to Product Analytics](https://mixpanel.com/content/guide-to-product-analytics/)
diff --git a/pages/docs/reports/_meta.ts b/pages/docs/reports/_meta.ts
index a48941a2fa..764ba31495 100644
--- a/pages/docs/reports/_meta.ts
+++ b/pages/docs/reports/_meta.ts
@@ -3,6 +3,5 @@ export default {
"funnels": "Funnels",
"retention": "Retention",
"flows": "Flows",
- "experiments": "Experiments",
"apps": "Apps"
}
diff --git a/pages/docs/reports/apps/experiments.mdx b/pages/docs/reports/apps/experiments.mdx
index d961920e53..51d6b5fd46 100644
--- a/pages/docs/reports/apps/experiments.mdx
+++ b/pages/docs/reports/apps/experiments.mdx
@@ -3,7 +3,7 @@ import { Callout } from 'nextra/components'
# Experiments (Deprecating Soon)
- This app will be deprecated on Nov 1, 2025. To analyze Experiments in Mixpanel, please use our new [Experiments Report](/docs/reports/experiments)
+ This app will be deprecated on Nov 1, 2025. To analyze Experiments in Mixpanel, please use our new [Experiments Report](/docs/experiments)
## Overview
diff --git a/redirects/local.txt b/redirects/local.txt
index 8597aa4d63..62aedcc290 100644
--- a/redirects/local.txt
+++ b/redirects/local.txt
@@ -22,7 +22,8 @@
/docs/analysis/advanced/custom-events /docs/features/custom-events
/docs/analysis/advanced/custom-properties /docs/features/custom-properties
/docs/analysis/advanced/embeds /docs/features/embeds
-/docs/analysis/advanced/experiments /docs/reports/apps/experiments
+/docs/analysis/advanced/experiments /docs/experiments
+/docs/reports/experiments /docs/experiments
/docs/analysis/advanced/group-analytics /docs/data-structure/advanced/group-analytics
/docs/analysis/advanced/impact /docs/reports/apps/impact
/docs/analysis/advanced/other-advanced-features /docs/features/advanced
From aa0482e4e991cbdc2b7b712b5514786fffaeee60 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 13:38:51 -0400
Subject: [PATCH 02/12] update experiments.mdx
---
pages/docs/experiments.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index 8451af5d8f..3f414feede 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -214,7 +214,7 @@ You can specify the event and property that should be used as the exposure event
For example, you begin an experiment on 1st Aug, and 1M users are ‘assigned’ to the control and variant. You do not want to send an ‘exposure’ event for all these users right away, as they have only been assigned to the experiment. It’s possible that some user gets exposed on 4th Aug and some on 8th Aug. You would want to track $experiment_started at the exposure for accurate analysis.
-## Billing FAQ
+## Experimentation Pricing FAQ
The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please [contact us](https://mixpanel.com/contact-us/sales/) for more details.
From 084b9e8fbcc0d2d30488bcca64e3475f338c58f4 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 13:43:29 -0400
Subject: [PATCH 03/12] Update best-practices.mdx
---
pages/docs/experiments/best-practices.mdx | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/pages/docs/experiments/best-practices.mdx b/pages/docs/experiments/best-practices.mdx
index 7b3ff1f0e3..93924652d3 100644
--- a/pages/docs/experiments/best-practices.mdx
+++ b/pages/docs/experiments/best-practices.mdx
@@ -99,7 +99,6 @@ import { Callout } from 'nextra/components'
### Handling Multiple Variants
- **Limit the number of variants** to maintain statistical power
-- **Consider pairwise comparisons** vs. overall ANOVA
- **Adjust significance levels** when making multiple comparisons
- **Plan your analysis approach** before starting the test
@@ -123,6 +122,5 @@ Remember that experimentation is both an art and a science. While these guidelin
## Additional Resources
-- [Sample Size Calculators](https://mixpanel.com/tools/sample-size-calculator)
-- [A/B Testing Significance Calculator](https://mixpanel.com/tools/ab-test-calculator)
+- [What is Product Experimentation?](https://mixpanel.com/blog/product-experimentation/)
- [Mixpanel's Guide to Product Analytics](https://mixpanel.com/content/guide-to-product-analytics/)
From 773518aab03e203374888f0f2c8453aa960eb476 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 14:09:48 -0400
Subject: [PATCH 04/12] combine workflow and overview
---
pages/docs/experiments.mdx | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index 3f414feede..8abdfe511c 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -19,7 +19,13 @@ Before getting started with experiments:
- **Baseline Metrics**: Have your key success metrics already defined in Mixpanel
- **Permissions**: Ensure you have the appropriate project permissions
-## Experiment Workflow
+## Overview & Workflow
+
+
+
+The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
+
+### Experiment Process
**Plan** → **Setup & Launch** → **Monitor** → **Interpret Results** → **Make Decisions**
@@ -29,12 +35,6 @@ Before getting started with experiments:
4. **Interpret Results**: Analyze statistical significance and lift
5. **Make Decisions**: Choose whether to ship, iterate, or abandon changes
-## Overview
-
-
-
-The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the **Experiments** tab in the navigation panel, or **Create New > Experiment**.
-
## Plan Your Experiment
Before creating an experiment report, ensure you have:
From 808f0c0d7704235f6f73b392ffb2568fcf97ec52 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 14:10:59 -0400
Subject: [PATCH 05/12] update best practices
---
pages/docs/experiments/best-practices.mdx | 166 +++++++---------------
1 file changed, 50 insertions(+), 116 deletions(-)
diff --git a/pages/docs/experiments/best-practices.mdx b/pages/docs/experiments/best-practices.mdx
index 93924652d3..857e841720 100644
--- a/pages/docs/experiments/best-practices.mdx
+++ b/pages/docs/experiments/best-practices.mdx
@@ -2,125 +2,59 @@ import { Callout } from 'nextra/components'
# Experimentation Best Practices
-## Hypothesis Development
-
-### Craft Strong Hypotheses
-- **Structure**: "If [change], then [outcome], because [reasoning]"
-- **Specific**: Define exactly what you're changing and what you expect to happen
-- **Measurable**: Tie to concrete metrics you can track
-- **Time-bound**: Set clear expectations for when effects should appear
-
-### Example Good Hypothesis
-"If we add social proof badges to product pages, then conversion rate will increase by 5%, because users trust products that others have purchased."
-
-## Sample Size Planning
-
-### Determine Adequate Sample Size
-- **Use power analysis**: Calculate required sample size before starting
-- **Consider your baseline**: Lower baseline rates need larger samples
-- **Factor in expected lift**: Smaller expected changes need more users
-- **Account for segments**: Plan for subgroup analysis needs
-
-### General Guidelines
-- **Minimum Detectable Effect**: Aim for changes of at least 2-5%
-- **Statistical Power**: Target 80% power (ability to detect true effects)
-- **Significance Level**: Typically use 95% confidence (α = 0.05)
-
-## Metric Selection
-
-### Primary Metrics
-- **Choose 1-2 primary metrics maximum** to avoid multiple testing issues
-- **Select metrics that directly measure your hypothesis**
-- **Ensure metrics are sensitive to your changes** (will move within test timeframe)
-
-### Guardrail Metrics
-- **Monitor key business metrics** (revenue, retention, satisfaction)
-- **Track user experience indicators** (page load time, error rates)
-- **Watch for unintended consequences** in related product areas
-
-### Secondary Metrics
-- **Help explain the "why" behind primary metric changes**
-- **Provide additional context for decision making**
-- **Explore user behavior patterns**
-
-## Experiment Design
-
-### Randomization Best Practices
-- **Use proper randomization units** (typically users, not sessions)
-- **Ensure random assignment is consistent** across user sessions
-- **Account for network effects** when users can influence each other
-- **Consider stratification** for important user segments
-
-### Control Group Management
-- **Always include a proper control group** (status quo)
-- **Keep control groups large enough** for reliable comparisons
-- **Avoid making changes to control** during the experiment
-
-## Avoiding Common Pitfalls
-
-### Statistical Issues
-- **Don't peek at results repeatedly** without adjusting significance levels
-- **Avoid stopping experiments early** unless using sequential testing
-- **Be aware of multiple testing problems** when analyzing many metrics
-- **Don't cherry-pick time periods** for analysis
-
-### Implementation Issues
-- **Validate exposure event tracking** before launching
-- **Test your experiment setup** with a small percentage first
-- **Monitor for technical issues** that could bias results
-- **Ensure consistent user experience** across variant groups
-
-### Business Context
-- **Consider external factors** (holidays, marketing campaigns, seasonality)
-- **Account for learning effects** (users adapting to changes over time)
-- **Plan for network effects** in social or marketplace products
-- **Think about long-term vs. short-term impacts**
-
-## Running Experiments at Scale
-
-### Experiment Pipeline
-- **Maintain a roadmap** of planned experiments
-- **Prioritize based on potential impact** and ease of implementation
-- **Allow adequate time** between related experiments
-- **Document learnings** for organizational knowledge
-
-### Resource Management
-- **Plan engineering resources** for implementation and monitoring
-- **Coordinate with marketing teams** to avoid conflicting campaigns
-- **Consider user fatigue** from too many simultaneous experiments
-- **Balance learning goals** with product development velocity
-
-## Statistical Considerations
-
-### Sequential vs. Frequentist Testing
-- **Sequential Testing**: Good for detecting large effects quickly, allows early stopping
-- **Frequentist Testing**: Better for small effect detection, requires full test duration
-- **Choose based on your goals**: Quick decisions vs. precise measurements
-
-### Handling Multiple Variants
-- **Limit the number of variants** to maintain statistical power
-- **Adjust significance levels** when making multiple comparisons
-- **Plan your analysis approach** before starting the test
-
-## Advanced Topics
-
-### Segmentation Analysis
-- **Plan key segments in advance** (new vs. returning users, etc.)
-- **Use interaction effects** to understand segment differences
-- **Be cautious about post-hoc segmentation** (can lead to false discoveries)
-- **Consider segment size requirements** for reliable results
-
-### Long-term Effects
-- **Plan for post-experiment monitoring** to catch delayed effects
-- **Consider novelty effects** that may wear off over time
-- **Think about user learning curves** for complex features
-- **Monitor competitive responses** that might influence results
+Running experiments isn't just about flipping switches and hoping for the best. The difference between experiments that generate real insights and those that waste time often comes down to a few key practices. Here's what we've learned from working with thousands of teams on their experimentation programs.
+
+## Start with a Strong Hypothesis
+
+Your experiment is only as good as the hypothesis behind it. A weak hypothesis leads to inconclusive results, even when everything else is done perfectly.
+
+The best hypotheses follow a simple structure: "If [change], then [outcome], because [reasoning]." For example: "If we add social proof badges to product pages, then conversion rate will increase by 5%, because users trust products that others have purchased."
+
+Why does this work? It forces you to think through not just *what* you're testing, but *why* you expect it to work. That reasoning becomes crucial when you're interpreting results later.
+
+## Get Your Sample Size Right
+
+Nothing kills an experiment faster than realizing afterward that you didn't have enough users to detect a meaningful difference. This is especially painful when you've spent weeks collecting data.
+
+Before you start, calculate your required sample size using a power analysis. Consider your baseline conversion rate—lower rates need larger samples to detect the same percentage change. If you're looking for a 5% improvement on a 2% baseline, you'll need far more users than improving from 20% to 25%.
+
+As a general rule, aim to detect changes of at least 2-5%. Smaller improvements are often not worth the implementation effort, and detecting them requires massive sample sizes.
+
+## Choose Metrics Carefully
+
+Here's where many experiments go wrong: trying to measure everything instead of focusing on what matters.
+
+Pick 1-2 primary metrics. These should directly measure what your hypothesis predicts will change. If you're testing a new checkout flow, your primary metric should be conversion rate, not page views.
+
+But don't stop there. Set up guardrail metrics to catch unintended consequences. If your new checkout flow increases conversions but doubles customer service complaints, you need to know that. Monitor key business metrics like revenue, retention, and satisfaction alongside your primary metrics.
+
+Secondary metrics help you understand the "why" behind your results. If conversion goes up, are users spending more time on the page? Are they more likely to return? These insights guide your next experiments.
+
+## Avoid the Peeking Problem
+
+It's tempting to check results daily, especially when you're excited about a test. But repeatedly checking for significance without adjusting your analysis can lead to false positives.
+
+If you must peek (and we all do), either use sequential testing methods that account for multiple looks, or commit to your original test duration. The worst outcome is stopping an experiment early because you saw a promising spike that later disappears.
+
+## Think Beyond the Test Window
+
+The best experiments teach you something about your users, not just about a specific feature. When results come in, dig deeper than just "variant A beat variant B."
+
+Look at different user segments. Did new users respond differently than returning ones? Were there differences by device or geography? These patterns often reveal insights that apply to future experiments.
+
+Consider long-term effects too. Some changes show immediate benefits that fade over time as novelty wears off. Others take weeks to show their true impact as users adapt to new workflows.
+
+## Scale Your Learning
+
+As your experimentation program matures, the real value comes from the cumulative learning across all your tests. Document what you learn from each experiment, even the "failed" ones. Failed experiments that contradict your intuition are often the most valuable—they reveal gaps in your understanding of user behavior.
+
+Build a roadmap of related experiments. If testing a new headline increases signups, what happens when you test the entire landing page? Each experiment should inform the next, creating a virtuous cycle of learning.
-Remember that experimentation is both an art and a science. While these guidelines provide a strong foundation, always consider your specific product context and user base when designing experiments.
+Remember that experimentation is both an art and a science. These practices provide a foundation, but every product and user base is different. Use your judgment and adapt these guidelines to your specific context.
## Additional Resources
-- [What is Product Experimentation?](https://mixpanel.com/blog/product-experimentation/)
-- [Mixpanel's Guide to Product Analytics](https://mixpanel.com/content/guide-to-product-analytics/)
+What is Product Experimentation? @https://mixpanel.com/blog/product-experimentation/
+Mixpanel's Guide to Product Analytics @https://mixpanel.com/content/guide-to-product-analytics/intro/
From 2ecf5624b9a5367a210e8ae114e04dae35984a14 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 16:21:54 -0400
Subject: [PATCH 06/12] Update experiments.mdx
- statistical signifigance
- implementation
---
pages/docs/experiments.mdx | 130 ++++++++++++++++++++++++++-----------
1 file changed, 93 insertions(+), 37 deletions(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index 8abdfe511c..d2e0f1b773 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -3,7 +3,7 @@ import { Callout } from 'nextra/components'
# Experiments: Measure the impact of a/b testing
- The Experiment Report is a separately priced product. It is currently only offered to those on the Enterprise Plan. See our [pricing page](https://mixpanel.com/pricing/) for more details.
+ The Experiment Report is a separately priced product add-on. It is currently only offered to those on the Enterprise Plan. See our [pricing page](https://mixpanel.com/pricing/) for more details.
## Why Experiment?
@@ -14,10 +14,8 @@ Experimentation helps you make data-driven product decisions by measuring the re
Before getting started with experiments:
-- **Enterprise Plan**: Experimentation is only available on Enterprise plans
-- **Exposure Event Tracking**: You must implement `$experiment_started` event tracking
-- **Baseline Metrics**: Have your key success metrics already defined in Mixpanel
-- **Permissions**: Ensure you have the appropriate project permissions
+- **Exposure Event Tracking**:[Implement experimentation events](#implementation-for-experimentation)
+- **Baseline Metrics**: Have your key success metrics already measured in Mixpanel
## Overview & Workflow
@@ -103,16 +101,41 @@ The main reason you look at statistical significance (p-value) is to get confide
#### Statistical Significance Calculation
-Mixpanel calculates statistical significance using different methods based on your experiment model:
+Mixpanel uses Frequentist statistical methods to compute p-values and confidence intervals. The specific approach depends on your metric type and experiment model.
+
+**Metric Types and Their Distributions:**
+
+Mixpanel categorizes metrics into three types, each using different statistical distributions:
+
+1. **Count Metrics** (Total Events, Total Sessions): Use **Poisson distribution**
+ - Examples: Total purchases, total page views, session count
+ - Variance equals the mean (characteristic of Poisson distributions)
+
+2. **Rate Metrics** (Conversion rates, Retention rates): Use **Bernoulli distribution**
+ - Examples: Signup conversion rate, checkout completion rate, 7-day retention
+ - Models binary outcomes (did/didn't convert) across your user base
+
+3. **Value Metrics** (Averages, Sums of properties): Use **normal distribution approximation**
+ - Examples: Average order value, total revenue, average session duration
+ - Calculates variance using sample statistics and Central Limit Theorem
+
+**Statistical Calculation Process:**
+
+For all metric types, we follow the same general process:
+
+1. **Calculate group rates** for control and treatment
+2. **Estimate variance** using the appropriate distribution
+3. **Compute standard error** from variance and sample size
+4. **Calculate Z-score** measuring how many standard errors apart the groups are
+5. **Derive p-value** from Z-score using normal distribution (via Central Limit Theorem)
**For Sequential Testing:**
-- Uses continuous monitoring with adjusted significance thresholds
+- Uses continuous monitoring with adjusted significance thresholds (mSPRT method)
- Allows for early stopping when significance is reached
-- Accounts for multiple testing through sequential boundaries
+- More conservative calculations to account for multiple testing
**For Frequentist Testing:**
- Uses traditional hypothesis testing with fixed sample sizes
-- Calculates p-values using standard statistical tests (t-tests for continuous metrics, chi-square for categorical)
- Formula: Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
In the above image for example, max p=0.025 [(1-0.95)/2]
@@ -122,28 +145,60 @@ So, if an experiment's results show
- p ≤ 0.025: results are statistically significant for this metric, i.e, you can be 95% confident in the lift seen if the change is rolled out to all users.
- p > 0.025: results are not statistically significant for this metric, i.e, you cannot be very confident in the results if the change is rolled out broadly.
+#### Example: E-commerce Checkout Experiment
+
+To illustrate how these calculations work in practice, let's walk through a concrete example.
+
+**Scenario:** Testing a new checkout UI on an e-commerce site with 20 users (10 control, 10 treatment).
+
+**Results:**
+- **Control group:** 5 users converted (50% conversion rate), average cart size $60
+- **Treatment group:** 6 users converted (60% conversion rate), average cart size $67
+
+**For Conversion Rate (Rate Metric - Bernoulli Distribution):**
+
+1. **Group rates:** Control = 0.5, Treatment = 0.6
+2. **Variance calculation:** Control = 0.5 × (1-0.5) = 0.25, Treatment = 0.6 × (1-0.6) = 0.24
+3. **Standard error:** Combined SE = √((0.25/10) + (0.24/10)) = 0.221
+4. **Z-score:** (0.6 - 0.5) / 0.221 = 0.45
+5. **P-value:** ~0.65 (not statistically significant)
+
+**For Average Cart Size (Value Metric - Normal Distribution):**
+
+1. **Group means:** Control = $60, Treatment = $67
+2. **Variance calculation:** Uses sample variance of cart values in each group
+3. **Standard error:** Calculated from combined variance and sample sizes
+4. **Z-score and p-value:** Computed using the same Z-test framework
+
+This example shows why larger sample sizes are crucial—with only 10 users per group, even a 10-point difference in conversion rate isn't statistically significant.
+
### How do you read lift?
Lift is the percentage difference between the control and variant(s) metrics.
$Lift= { (variant \,group\,rate - control \,group\,rate) \over (control \,group\,rate)}$
-Lift, mean, and variance are calculated differently based on the type of metric being analyzed. We categorize metrics into 3 types:
+Lift, mean, and variance are calculated differently based on the type of metric being analyzed:
-- **Numeric** - any metrics that involve numeric property math (sum, average, etc)
-- **Binomial** - any metric that has a true or false outcome (unique users, funnel conversions, retention)
-- **Rate** - any metric that can be conceptualized as a rate (funnel conversion rate, total events/experiment, etc)
+**Count Metrics (Total Events, Sessions):**
+- **Group Rate:** Total count ÷ Number of users exposed
+- **Variance:** Equal to the mean (Poisson distribution property)
+- **Example:** If treatment group has 150 total purchases from 100 exposed users, group rate = 1.5 purchases per user
-The ‘group rate’ is calculated differently depending on the type of metric.
+**Rate Metrics (Conversion, Retention):**
+- **Group Rate:** The actual rate (already normalized)
+- **Variance:** Calculated using Bernoulli distribution: p × (1-p)
+- **Example:** If 25 out of 100 users convert, group rate = 0.25 (25% conversion rate)
-- For numeric & binomial metrics:
+**Value Metrics (Averages, Sums):**
+- **Group Rate:** Sum of property values ÷ Number of users exposed
+- **Variance:** Calculated from the distribution of individual property values
+- **Example:** If treatment group spent $5,000 total from 100 users, group rate = $50 average per exposed user
- $Group\,Rate= { (\# \,Metric\,absolute\,value) \over (\# of\,users\,exposed)}$
-
- NOTE: Normalizing the rate based on the number of users exposed helps understand the possible impact on every single user exposed to the experiment
-
-- For rate metrics: the group rate is the same as the metric for the users in the group. Example: if calculating a funnel conversion rate, then the group rate is the overall conversion rate of the funnel for users in the group.
-
- NOTE: Conversion rates are normalized as is, hence no further normalization is done
+**Why This Matters:**
+Normalizing by exposed users (not just converters) helps you understand the impact on your entire user base. A feature that increases average order value among buyers but reduces conversion rate might actually decrease overall revenue per user.
+
+**Custom Formula Metrics:**
+For complex metrics using formulas like `Revenue per User = Total Revenue ÷ Unique Users`, Mixpanel uses propagation of uncertainty to estimate variance. This combines the variances of the component metrics (Total Revenue and Unique Users) to calculate the overall metric's statistical significance. The system assumes metrics in formulas are uncorrelated for these calculations.
### When do we say the Experiment is ready to review?
Once the ‘Test Duration’ setup during configuration is complete, we show a banner that says “Experiment is ready to review”.
@@ -169,22 +224,8 @@ The Experiment report behavior is powered by [borrowed properties](/docs/feature
For every user event, we identify if the event is performed after being exposed to an experiment. If it were, then we would borrow the variant details from the tracked `$experiment_started` to attribute the event to the proper variant.
-### FAQs
-1. If a user switches variants mid-experiment, how do we calculate the impact on metrics?
-
- We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant.
-
-2. If a user is part of multiple experiments, how do we calculate the impact of a single experiment?
-
- We consider the complete user’s behavior for every experiment that they are a part of.
-
- We believe this will still give accurate results for a particular experiment, as the users have been randomly allocated. So there should be enough similar users, ie, part of multiple experiments, across both control and variants for a particular experiment.
-
-3. For what time duration do we associate the user being exposed to an experiment to impact metrics?
-
- Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.
-## Adding Experiments to an Implementation
+### Implementation for Experimentation
Mixpanel experiment analysis work based on exposure events. To use the experiment report, you must send your Exposure events in the following format:
@@ -214,6 +255,21 @@ You can specify the event and property that should be used as the exposure event
For example, you begin an experiment on 1st Aug, and 1M users are ‘assigned’ to the control and variant. You do not want to send an ‘exposure’ event for all these users right away, as they have only been assigned to the experiment. It’s possible that some user gets exposed on 4th Aug and some on 8th Aug. You would want to track $experiment_started at the exposure for accurate analysis.
+### FAQs
+1. If a user switches variants mid-experiment, how do we calculate the impact on metrics?
+
+ We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant.
+
+2. If a user is part of multiple experiments, how do we calculate the impact of a single experiment?
+
+ We consider the complete user’s behavior for every experiment that they are a part of.
+
+ We believe this will still give accurate results for a particular experiment, as the users have been randomly allocated. So there should be enough similar users, ie, part of multiple experiments, across both control and variants for a particular experiment.
+
+3. For what time duration do we associate the user being exposed to an experiment to impact metrics?
+
+ Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.
+
## Experimentation Pricing FAQ
The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please [contact us](https://mixpanel.com/contact-us/sales/) for more details.
From 9ac1a4ab4f490ada7be8e872c3c473fdce9887d9 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 16:39:51 -0400
Subject: [PATCH 07/12] Update experiments.mdx
---
pages/docs/experiments.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index d2e0f1b773..fbf19edebe 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -130,7 +130,7 @@ For all metric types, we follow the same general process:
5. **Derive p-value** from Z-score using normal distribution (via Central Limit Theorem)
**For Sequential Testing:**
-- Uses continuous monitoring with adjusted significance thresholds (mSPRT method)
+- Uses continuous monitoring with adjusted significance thresholds with [mSPRT method](https://arxiv.org/pdf/1905.10493)
- Allows for early stopping when significance is reached
- More conservative calculations to account for multiple testing
From d04fcc3a30986ff7d1e0c01c01e8ccd17ee7a950 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 16:45:42 -0400
Subject: [PATCH 08/12] Update cspell.json
---
cspell.json | 2 ++
1 file changed, 2 insertions(+)
diff --git a/cspell.json b/cspell.json
index ee5ad50384..c009f69096 100644
--- a/cspell.json
+++ b/cspell.json
@@ -167,6 +167,7 @@
"mpmetrics",
"mputils",
"msclkid",
+ "mSPRT",
"multischema",
"mxpnl",
"MYAPP",
@@ -228,6 +229,7 @@
"Signups",
"skus",
"splitlines",
+ "SPRT",
"Stackdriver",
"stddev",
"Steph",
From 2fe4ae0861233136a9cee42a4a88265b4edd143f Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 16:48:19 -0400
Subject: [PATCH 09/12] remove best practices + directory
---
pages/docs/experiments/_meta.ts | 3 --
pages/docs/experiments/best-practices.mdx | 60 -----------------------
2 files changed, 63 deletions(-)
delete mode 100644 pages/docs/experiments/_meta.ts
delete mode 100644 pages/docs/experiments/best-practices.mdx
diff --git a/pages/docs/experiments/_meta.ts b/pages/docs/experiments/_meta.ts
deleted file mode 100644
index 7639fc6e4a..0000000000
--- a/pages/docs/experiments/_meta.ts
+++ /dev/null
@@ -1,3 +0,0 @@
-export default {
- "best-practices": "Best Practices"
-}
diff --git a/pages/docs/experiments/best-practices.mdx b/pages/docs/experiments/best-practices.mdx
deleted file mode 100644
index 857e841720..0000000000
--- a/pages/docs/experiments/best-practices.mdx
+++ /dev/null
@@ -1,60 +0,0 @@
-import { Callout } from 'nextra/components'
-
-# Experimentation Best Practices
-
-Running experiments isn't just about flipping switches and hoping for the best. The difference between experiments that generate real insights and those that waste time often comes down to a few key practices. Here's what we've learned from working with thousands of teams on their experimentation programs.
-
-## Start with a Strong Hypothesis
-
-Your experiment is only as good as the hypothesis behind it. A weak hypothesis leads to inconclusive results, even when everything else is done perfectly.
-
-The best hypotheses follow a simple structure: "If [change], then [outcome], because [reasoning]." For example: "If we add social proof badges to product pages, then conversion rate will increase by 5%, because users trust products that others have purchased."
-
-Why does this work? It forces you to think through not just *what* you're testing, but *why* you expect it to work. That reasoning becomes crucial when you're interpreting results later.
-
-## Get Your Sample Size Right
-
-Nothing kills an experiment faster than realizing afterward that you didn't have enough users to detect a meaningful difference. This is especially painful when you've spent weeks collecting data.
-
-Before you start, calculate your required sample size using a power analysis. Consider your baseline conversion rate—lower rates need larger samples to detect the same percentage change. If you're looking for a 5% improvement on a 2% baseline, you'll need far more users than improving from 20% to 25%.
-
-As a general rule, aim to detect changes of at least 2-5%. Smaller improvements are often not worth the implementation effort, and detecting them requires massive sample sizes.
-
-## Choose Metrics Carefully
-
-Here's where many experiments go wrong: trying to measure everything instead of focusing on what matters.
-
-Pick 1-2 primary metrics. These should directly measure what your hypothesis predicts will change. If you're testing a new checkout flow, your primary metric should be conversion rate, not page views.
-
-But don't stop there. Set up guardrail metrics to catch unintended consequences. If your new checkout flow increases conversions but doubles customer service complaints, you need to know that. Monitor key business metrics like revenue, retention, and satisfaction alongside your primary metrics.
-
-Secondary metrics help you understand the "why" behind your results. If conversion goes up, are users spending more time on the page? Are they more likely to return? These insights guide your next experiments.
-
-## Avoid the Peeking Problem
-
-It's tempting to check results daily, especially when you're excited about a test. But repeatedly checking for significance without adjusting your analysis can lead to false positives.
-
-If you must peek (and we all do), either use sequential testing methods that account for multiple looks, or commit to your original test duration. The worst outcome is stopping an experiment early because you saw a promising spike that later disappears.
-
-## Think Beyond the Test Window
-
-The best experiments teach you something about your users, not just about a specific feature. When results come in, dig deeper than just "variant A beat variant B."
-
-Look at different user segments. Did new users respond differently than returning ones? Were there differences by device or geography? These patterns often reveal insights that apply to future experiments.
-
-Consider long-term effects too. Some changes show immediate benefits that fade over time as novelty wears off. Others take weeks to show their true impact as users adapt to new workflows.
-
-## Scale Your Learning
-
-As your experimentation program matures, the real value comes from the cumulative learning across all your tests. Document what you learn from each experiment, even the "failed" ones. Failed experiments that contradict your intuition are often the most valuable—they reveal gaps in your understanding of user behavior.
-
-Build a roadmap of related experiments. If testing a new headline increases signups, what happens when you test the entire landing page? Each experiment should inform the next, creating a virtuous cycle of learning.
-
-
-Remember that experimentation is both an art and a science. These practices provide a foundation, but every product and user base is different. Use your judgment and adapt these guidelines to your specific context.
-
-
-## Additional Resources
-
-What is Product Experimentation? @https://mixpanel.com/blog/product-experimentation/
-Mixpanel's Guide to Product Analytics @https://mixpanel.com/content/guide-to-product-analytics/intro/
From 3d25dcda8a6ed5133cc38d4e916b640246ddb395 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 16:57:12 -0400
Subject: [PATCH 10/12] Update experiments.mdx
---
pages/docs/experiments.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index fbf19edebe..f431045108 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -14,7 +14,7 @@ Experimentation helps you make data-driven product decisions by measuring the re
Before getting started with experiments:
-- **Exposure Event Tracking**:[Implement experimentation events](#implementation-for-experimentation)
+- **Exposure Event Tracking**: [Implement](#implementation-for-experimentation) your experimentation events
- **Baseline Metrics**: Have your key success metrics already measured in Mixpanel
## Overview & Workflow
From 1e824a3298e0cee99001b009fab48b86ec4684d4 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 17:41:20 -0400
Subject: [PATCH 11/12] adjust central limit theorem explanation
---
pages/docs/experiments.mdx | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index f431045108..59003ed389 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -117,7 +117,7 @@ Mixpanel categorizes metrics into three types, each using different statistical
3. **Value Metrics** (Averages, Sums of properties): Use **normal distribution approximation**
- Examples: Average order value, total revenue, average session duration
- - Calculates variance using sample statistics and Central Limit Theorem
+ - Calculates variance using sample statistics
**Statistical Calculation Process:**
@@ -127,7 +127,10 @@ For all metric types, we follow the same general process:
2. **Estimate variance** using the appropriate distribution
3. **Compute standard error** from variance and sample size
4. **Calculate Z-score** measuring how many standard errors apart the groups are
-5. **Derive p-value** from Z-score using normal distribution (via Central Limit Theorem)
+5. **Derive p-value** from Z-score using normal distribution
+
+**Statistical Foundation:**
+Our calculations assume normal distributions for the sampling distributions of our metrics. While individual data points may not be normally distributed, the Central Limit Theorem tells us that with sufficient sample sizes, the sampling distributions of means and proportions will approximate normal distributions, making our statistical methods valid.
**For Sequential Testing:**
- Uses continuous monitoring with adjusted significance thresholds with [mSPRT method](https://arxiv.org/pdf/1905.10493)
From d43613cb19b0aa4d1dd19f464c7a8b20ca1c5a83 Mon Sep 17 00:00:00 2001
From: ishamehramixpanel <117322225+ishamehramixpanel@users.noreply.github.com>
Date: Wed, 24 Sep 2025 18:13:34 -0400
Subject: [PATCH 12/12] Update experiments.mdx
---
pages/docs/experiments.mdx | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pages/docs/experiments.mdx b/pages/docs/experiments.mdx
index 59003ed389..d179de381b 100644
--- a/pages/docs/experiments.mdx
+++ b/pages/docs/experiments.mdx
@@ -95,7 +95,7 @@ Metric rows in the table are highlighted when any difference is calculated with
### How do you read statistical significance?
-The main reason you look at statistical significance (p-value) is to get confidence on what it means for the larger rollout.
+Statistical significance (p-value) helps you determine whether your experiment results are likely to hold true for the full rollout, giving you confidence in your decisions.
