Quantcast
Get Help And Discuss STEM Concepts From Math To Data Science & Financial Literacy

Help The Education Support Forum through MathsGee serve learners across Africa with a DONATION

0 like 0 dislike
19 views
You’ve got a data set to work having p (no. of variable) > n (no. of observation). Why is OLS as bad option to work with? Which techniques would be best to use? Why?
in Data Science by Diamond (47.9k points) | 19 views

1 Answer

0 like 0 dislike
In such high dimensional data sets, we can’t use classical regression techniques, since their assumptions tend to fail. When p > n, we can no longer calculate a unique least square coefficient estimate, the variances become infinite, so OLS cannot be used at all.

To combat this situation, we can use penalized regression methods like lasso, LARS, ridge which can shrink the coefficients to reduce variance. Precisely, ridge regression works best in situations where the least square estimates have higher variance.

Among other methods include subset regression, forward stepwise regression.
by Wooden (3.0k points)

Related questions

0 like 0 dislike
0 answers
0 like 0 dislike
0 answers
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
0 answers
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer

Welcome to MathsGee Q&A Bank, Africa’s largest personalized STEM and Financial Literacy education network that helps people find answers to problems, connect with others and take action to improve their outcomes.


MathsGee Q&A is the STEM and Financial Literacy knowledge-sharing community where students and experts put their heads together to crack their toughest questions.


Help The Education Support Forum through MathsGee serve learners across Africa with a DONATION

Enter your email address: