Detecting Hospital Acquired Infections usingMachine Learning

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Markus NÄsman; [2013]

Keywords: ;

Abstract: Every year a large number of patients contract infections due to their hospital stay. These infections are a major hazard to patient safety causing increased mortality and morbidity in affected patients. Manual detection and reporting of these infections add to the workload of the medical staff which makes it infeasible to do on a continuous basis. The goal is to automate detection using machine learning methods. This will be done using supervised learning and data available in electronic patient records. As most of the data available is in unstructured free-text the emphasis of this thesis is on how to turn this text into features that are able to capture the patterns associated with hospital acquired infections. Three different data representations are explored: bag of words, complex symbolic sequences and simple parameters by information extraction. The classifiers used are support vector machines and gradient tree boosting. The data-set used consists of 300 hospitalizations from Karolinska University Hospital, Sweden from 2011 and 2012. These hospitalizations have been marked has having a hospital acquired infection or not by medical experts and the class distribution is: 53% contain a hospital acquired infection and 47% do not. Support vector machines and gradient tree boosting perform similarly for the task but the focus is on gradient tree boosting due to its visualization capabilities. The best results, evaluated using 5-fold cross-validation, are obtained by gradient tree boosting giving a F1-score in the range of 0.82-0.83, recall in the range 0.88-0.89 and a precision of 0.78, for all three data representations. Future research will have to focus on how to incorporate more parameters into the information extraction based representations, how to capture patterns common only in minority subclasses and how well the three data representations workload with larger datasets

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)