Regularization Methods and High Dimensional Data: A Comparative Study Based on Frequentist and Bayesian Methods

University essay from Lunds universitet/Statistiska institutionen

Abstract: As the amount of high dimensional data becomes increasingly accessible and common, the need for reliable methods to combat problems such as overfitting and multicollinearity increases. Models need to be able to manage large data sets where predictor variables often outnumber the amount of observations. In this study the frequentist and Bayesian framework is tested against each other based on three different simulated situations. One where the amount of predictor variables greatly outnumber the observations, one where the simulated data has a high correlation between variables and one where a situation is created where the coefficients to be estimated are known beforehand. This enables comparisons between true values and estimated values. Three different approaches are used from both of the statistical frameworks. The frequentist models consist of Ridge regression, least absolute shrinkage and selection operator (LASSO) regression as well as the combined model Elastic net regression. The Bayesian models consist of three regressions with different prior beliefs regarding the coefficients’ probability distributions. The Normal distribution, the Cauchy distribution and the Horseshoe distribution were chosen in this thesis. To compare the different frameworks, different loss functions have been used such as predictability on new data, amount of explained variance and the amount of unnecessary predictor variables the model successfully regularizes. The results of the study show that the Bayesian Horseshoe model has the greatest overall performance regarding predictability, variable selection and parameter estimation. The LASSO regres- sion performs better variable selection on highly correlated data than all of the other models. The frequentist models are also more easily computed if compu- tational power or time is a limited resource, in the other cases the Horseshoe model is to prefer.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)