Machine learning and its applications within insurance hit rates and credit risk modelling

University essay from Lunds universitet/Matematisk statistik

Abstract: This thesis aims to shine light on some different machine learning methods. As reference a more common statistical prediction method, namely the generalized linear model, is applied to compare the results of the machine learning methods. Six different machine learning methods are investigated. These methods are explained in detail and used to predict hit rates within insurance customers. To further explore the data sets and the methods, the data sets are rebalanced to deal with skewness of the target class. The insurance data set used contains 86 features, including the target feature, which can be troublesome in some cases, and therefore a feature reduction analysis is performed. Further the positives and negatives of the different methods and how to put machine learning in practice was discussed. Lastly a new data set is introduced and the machine learning methods are used to assess the risk of default within credit customers. The results show that random forest perform best of the different data sets, and it is fairly easy to interpret. The k-nn, naïve Bayes and decision tree do not perform as well as the random forest but are easier to use and requires much less computing time to tune and train. These less computational complex methods can be good when much data is available, but is inferior to regression methods when that is not the case. The support vector machine and the neural network are complex but have potential for greatness. Further investigation into the different models we used are needed, especially the support vector machine and the neural network.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)