Encoding Temporal Healthcare Data for Machine Learning
Abstract: This thesis contains a review of previous work in the fields of encoding sequential healthcare data and predicting graft- versus- host disease, a medical condition, based on patient history using machine learning. A new encoding of such data is proposed for machine learning purposes. The proposed encoding, called bag of binned weighted events, is a combination of two strategies proposed in previous work, called bag of binned events and bag of weighted events. An empirical experiment is designed to evaluate the predictive performance of the proposed encoding over various binning windows to that of the previous encodings, based on the area under the receiver operating characteristic curve (AUC) metric. The experiment is carried out on real- world healthcare data obtained from Swedish registries, using the random forest and the logistic regression algorithms. After filtering the data, solving quality issues and tuning hyperparameters of the models, final results are obtained. These results indicate that the proposed encoding strategy performs on par, or slightly better than the bag of weighted events, and outperforms the bag of binned events in most cases. However, differences in metrics show small differences. It is also observed that the proposed encoding usually performs better with longer binning windows which may be attributed to data noise. Future work is proposed in the form of repeating the experiment with different datasets and models, as well as changing the binning window length of the baseline algorithms.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)