In regression estimation, we are usually given an i.i.d. sample \(\left(\mathbf{X}_1, Y_1\right), \ldots\). \(\left(\mathbf{X}_n, Y_n\right)\) from a joint distribution \(P_{\mathbf{X}, Y}\). Our aim is to predict the target \(Y\) from the covariates or predictors \(\mathbf{X}\). In least squares regression, for example, we are looking for a function \(\hat{f}\) such that
\[
\hat{f}=\underset{f \in \mathcal{F}}{\operatorname{argmin}} \sum_{i=1}^n\left(Y_i-f\left(\mathbf{X}_i\right)\right)^2 .
\]