Variable selection for the Cox proportional hazards model : A simulation study comparing the stepwise, lasso and bootstrap approach
Abstract: In a regression setting with a number of measured covariates not all may be relevant to the response. By reducing the numbers of covariates included in the final model we could improve its prediction accurarcy as well as making it easier to interpret. In survival analysis, the study of time-to-event data, the most common form of regression is the semi-parametric Cox proportional hazard (PH) model. In this thesis we have compared three different ways to perform variable selection in the Cox PH model, stepwise regression, lasso and bootstrap. By simulating survival data we could control which covariates that were significant for the response. Fitting the Cox PH model to these data using the three different variable selection methods we could evaluate how well each method performs in finding the correct model. We found that while bootstrap in some cases could improve the stepwise approach its performance is strongly effected by the choice of inclusion frequency. Lasso performed equivalent or slightly better than the stepwise method for data with weak effects. However, when the data instead consists of strong effects, the performance of stepwise is considerably better than the performance of lasso.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)