STEM Gender Equality | Join us on ZOOM | Spreading Mathematical Happiness

Help The Education Support Forum through MathsGee serve learners across Africa with a DONATION

1 like 0 dislike
6 views
We are trying to learn regression parameters for a dataset which we know was generated from a polynomial of a certain degree, but we do not know what this degree is. Assume the data was actually generated from a polynomial of degree 5 with some added Gaussian noise (that is $y = w_0+w_1x+w_2x^2+w_3x^3+w_4x^4+w_5x^5+\epsilon, \epsilon \sim N(0,1))$.

For training we have 100 $\{x,y\}$ pairs and for testing we are using an additional set of 100 $\{x,y\}$ pairs. Since we do not know the degree of the polynomial we learn two models from the data. Model A learns parameters for a polynomial of degree 4 and model B learns parameters for a polynomial of degree 6. Which of these two models is likely to fit the test data better?
| 6 views

Degree 6 polynomial. Since the model is a degree 5 polynomial and we have enough training data, the model we learn for a six degree polynomial will likely fit a very small coefficient for $x^6$ . Thus, even though it is a six degree polynomial it will actually behave in a very similar way to a fifth degree polynomial which is the correct model leading to better fit to the data.