Randomized A/B or A/B/N tests are considered the gold standard in many quantitative scientific fields for evaluating treatment effects. Uber applies this technique to make objective, data-driven, and scientifically rigorous product and business decisions. In essence, classic A/B testing enables them to randomly split users into control and treatment groups to compare the decision metrics between these groups and determine the experiment’s treatment effects.

A common use case for this methodology is feature release experiments.

Suppose a product manager wants to evaluate whether a new feature increases user satisfaction with Uber’s platform. The product manager could use their XP to glean the following metrics: the average values of the metric in both treatment and control groups, the lift (treatment effect), whether the lift is significant, and whether the sample sizes are large enough to wield high statistical power.