MathsGee Answers is Zero-Rated (You do not need data to access) on: Telkom | Dimension Data | Rain | MWEB
First time here? Checkout the FAQs!

To add value to MathsGee Answers, you can now support those users that are helping you with their answers, questions, votes and comments. Simply click on their name and DONATE an amount to say "Thank you"

Network: Global | Joburg Libraries | MOOCs | StartUpTribe | Zim Invest | Donate

MathsGee is Zero-Rated (You do not need data to access) on: Telkom |Dimension Data | Rain | MWEB

1 like 0 dislike
What methodologies does Uber employ to perform sequential testing for metrics monitoring purposes?
in Data Science by Bronze Status (6,041 points) | 281 views

1 Answer

1 like 0 dislike
Best answer

The Uber experimentation team leverages two main methodologies to perform sequential testing for metrics monitoring purposes: the mixture sequential probability ratio test (mSPRT) and variance estimation with FDR.

Mixture Sequential Probability Ratio Test

The most common method they use for monitoring is mSPRT. This test builds on the likelihood ratio test by incorporating an extra specification of mixing distribution H. Suppose we are testing the metric difference with the null hypothesis being $\theta$ , then the test statistics could be written as:

Since we have large sample sizes and the central limit theorem can be applied to most cases, we use normal distribution as our mixing distribution, $H ~ N(0,r^2)$. This leads to easy computation and a closed form expression for  

Another useful property about this method is under null hypothesis, nH, 0 is proven to be a martingale:

 . Following this, we could construct $(1 - \alpha)$ confidence interval.


Variance estimation with FDR control

To apply sequential testing correctly, we need to estimate variance as accurately as possible.  Since we monitor the cumulative difference between our control and treatment groups on a daily basis, observations from the same users introduce correlations which violate the assumption of the mSPRT test. For example, if we are monitoring click through rates, then the metric from one user across multiple days may be correlated. To overcome this, we use delete-a-group jackknife variance estimation/block bootstrap methods to generalize mSPRT test under correlated data.

Since our monitoring system wants to evaluate the overall health of an ongoing experiment, we monitor many business metrics at the same time, potentially leading to false alarms. In theory, either the Bonferroni or BH correction could be applied in this scenario. However, since the potential loss of missing business degradations can be substantial, we apply BH correction here and also tune in parameters (MDE, power, tolerance for practical significance, etc.) for metrics with varying levels of importance and sensitivity.

by Bronze Status (6,041 points)
edited by

MathsGee Answers is a global, STEM-focused Q&A platform where you can ask people from all over the world educational questions for improved outcomes.

MathsGee Supporting City of Joburg

MathsGee Tools

Math Worksheet Generator

Math Algebra Solver

Trigonometry Simulations

Vectors Simulations

Matrix Arithmetic Simulations

Matrix Transformations Simulations

Quadratic Equations Simulations

Probability & Statistics Simulations

PHET Simulations

Visual Statistics

Interactive Courseware

ZeroEd Search Engine

Article Rewriter Tool

Word Counter Tool

Other Tools

Big Blue Button | STEM Gender Equality | ZOOM | Slack | eBook