To add value to MathsGee Answers, you can now support those users that are helping you with their answers, questions, votes and comments. Simply click on their name and DONATE an amount to say "Thank you"

Network: Global | Joburg Libraries | MOOCs | StartUpTribe | Zim Invest | Donate

MathsGee is Zero-Rated (You do not need data to access) on: Telkom |Dimension Data | Rain | MWEB

1 like 0 dislike
65 views
How does Uber use the p-value during A/B testing?
| 65 views

1 like 0 dislike

The p-value calculation is central to Uber's statistics engine. The p-value directly determines whether the XP reports that a result is significant. They compare the p-value to the false positive rate (Type-I error) they desire (0.05) in a common A/B test. Their XP leverages various procedures for p-value calculation, including:

• Welch’s t-test, the default test used for continuous metrics, e.g., completed trips.
• The Mann-Whitney U test, a nonparametric rank sum test used to detect severe skewness in the data. It requires weaker assumptions than the t-test and performs better with skewed data.
• The Chi-squared test, used for proportion metrics, e.g., rider retention rate.
• The Delta method (Deng et al. 2011) and bootstrap methods, used for standard error estimation whenever suitable to generate robust results for experiments with ratio metrics or with small sample sizes, e.g., the ratio of trips cancelled by riders.

On top of these calculations, Uber uses multiple comparison correction (the Benjamini-Hochberg procedure) to control the overall false discovery rate (FDR) when there are two or more treatment groups (e.g., in an A/B/C test or  an A/B/N test).

by Bronze Status (6,041 points)

0 like 0 dislike