BAGGED PREDICTION ACCURACY IN LINEAR REGRESSION

University essay from Uppsala universitet/Statistiska institutionen

Author: Daniel Kimby; [2022]

Keywords: ;

Abstract: Bootstrap aggregation, or bagging, is a prominent method used in statistical inquiry suggested to improve predictive performance. It is useful to confirm the efficacy of such improvements and to expand upon them. This thesis investigates whether the results of Leo Breiman's (1996) paper \emph{Bagging predictors} can be replicated, where bagging is shown to lower prediction error. Additionally, predictive performance of weighted bagging is investigated, where we weight using a function of the residual variance. The data used is simulated, consisting of a numerical outcome variable as well as 30 independent variables. Linear regression is run with forward step selection, selecting models with the lowest SSE. Predictions are saved for all 30 models. Separately, we run forward step selection, selecting significant p-values of the added coefficient, saving only one final model. Prediction error is measured in mean squared error. The results suggest that both bagged methods improve upon prediction error, selecting models with the lowest SSE, with unweighted bagging performing the best. The results are congruent with Breiman's (1996) results, with minor differences. P-value selection shows weighted bagging performing the best. Further research should be conducted on real data to verify these results, in particular with reference to weighted bagging.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)