Exploring the Importance of Women's Educational Attainment in HIV Risk Prediction : A Comparative Study of Logistic Regression, Random Forest and XGBoost

University essay from Uppsala universitet/Statistiska institutionen

Author: Towe Wallkulle; [2023]

Keywords: ;

Abstract: Due to extensive HIV testing and treatment programs, the rate of new HIV infections has declined in recent years. However, young women South of the Sahara continue to be disproportionately burdened by the epidemic. The aim of this thesis is to explore the complex association between women's educational attainment and HIV prevalence. For this aim, data from the most recent demographic and health survey in Zambia is used. Recent literature has highlighted the potential use of statistical machine learning algorithms in HIV risk prediction. This thesis investigates how a classical statistical method, logistic regression, compares to tree-based ensemble prediction methods. The results suggest that the latter methods outperform logistic regression in terms of classification accuracy. In line with previous results, the logistic regression analysis shows that higher education is negatively associated with HIV prevalence, when including an interaction term in the model specification. In contrary, results from the machine learning models do not provide sufficient evidence that women's education is a relatively important predictor of HIV prevalence in Zambia. Results from feature selection suggest that future research could be conducted with less extensive data collection, as the tree-based methods are found to perform well on a smaller subset of variables.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)