Automation of Medical Underwriting by Appliance of Machine Learning

University essay from Umeå universitet/Institutionen för matematik och matematisk statistik

Abstract: One of the most important fields regarding growth and development for mostorganizations today is the digitalization, or digital transformation. The offering oftechnological solutions to enhance existing, or create new, processes or products isemerging. That is, it’s of great importance that organizations continuously affirm thepotential of applying new technical solutions into their existing processes. For example, a well implemented AI solution for automation of an existing process is likely tocontribute with considerable business value.Medical underwriting for individual insurances, which is the process consideredin this project, is all about risk assessment based on the individuals medical record.Such task appears well suited for automation by a machine learning based applicationand would thereby contribute with substantial business value. However, to make aproper replacement of a manual decision making process, no important informationmight be excluded, which becomes rather challenging due to the fact that a considerable fraction of the information the medical records consists of unstructured textdata. In addition, the underwriting process is extremely sensible to mistakes regarding unnecessarily approve insurances where an enhanced risk of future claims can beassessed.Three algorithms, Logistic Regression, XGBoost and a Deep Learning model, wereevaluated on training data consisting of the medical records structured data from categorical and numerical answers, the text data as TF-IDF observation vectors, and acombination of both subsets of features. The XGBoost were the classifier performingbest according to the key metric, a pAUC over an FPR from 0 to 0.03.There is no question about the substantial importance of not to disregard anytype of information from the medical records when developing machine learning classifiers to predict the medical underwriting outcomes. At a very risk conservative andperformance pessimistic approach the best performing classifier did manage, if consider only the group of youngest kids (50% of sample), to recall close to 50% of allstandard risk applications at a false positive rate of 2%, when both structured andtext data were considered. Even though the structured data accounts for most of theexplanatory ability it becomes clear that the inclusive of the text data as TF-IDF observation vectors make for the differences needed to potentially generate a positivenet present value to an implementation of the model

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)