Predicting Risk Level in Life Insurance Application : Comparing Accuracy of Logistic Regression, DecisionTree, Random Forest and Linear Support VectorClassifiers

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background: Over the last decade, there has been a significant rise in the life insurance industry. Every life insurance application is associated with some level ofrisk, which determines the premium they charge. The process of evaluating this levelof risk for a life insurance application is time-consuming. In the present scenario, it is hard for the insurance industry to process millions of life insurance applications.One potential approach is to involve machine learning to establish a framework forevaluating the level of risk associated with a life insurance application. Objectives: The aim of this thesis is to perform two comparison studies. The firststudy aims to compare the accuracy of the logistic regression classifier, decision tree classifier, random forest classifier and linear support vector classifier for evaluatingthe level of risk associated with a life insurance application. The second study aimsto identify the impact of changes in the dataset over the accuracy of these selected classification models. Methods: The chosen approach was an experimentation methodology to attain theaim of the thesis and address its research questions. The experimentation involvedcomparing four ML algorithms, namely the LRC, DTC, RFC and Linear SVC. These algorithms were trained, validated and tested on two datasets. A new dataset wascreated by replacing the "BMI" variable with the "Life Expectancy" variable. Thefour selected ML algorithms were compared based on their performance metrics,which included accuracy, precision, recall and f1-score. Results: Among the four selected machine learning algorithms, random forest classifier attained higher accuracy with 53.79% and 52.80% on unmodified and modifieddatasets respectively. Hence, it was the most accurate algorithm for predicting risklevel in life insurance application. The second best algorithm was decision tree classifier with 51.12% and 50.79% on unmodified and modified datasets. The selectedmodels attained higher accuracies when they are trained, validated and tested withunmodified dataset. Conclusions: The random forest classifier scored high accuracy among the fourselected algorithms on both unmodified dataset and modified datasets. The selected models attained higher accuracies when they are trained, validated and tested with unmodified compared to modified dataset. Therefore, the unmodified dataset is more suitable for predicting risk level in life insurance application.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)