Predicting Default Probability in Credit Risk using Machine Learning Algorithms

University essay from KTH/Matematisk statistik

Abstract: This thesis has explored the field of internally developed models for measuring the probability of default (PD) in credit risk. As regulators put restrictions on modelling practices and inhibit the advance of risk measurement, the fields of data science and machine learning are advancing. The tradeoff between stricter regulation on internally developed models and the advancement of data analytics was investigated by comparing model performance of the benchmark method Logistic Regression for estimating PD with the machine learning methods Decision Trees, Random Forest, Gradient Boosting and Artificial Neural Networks (ANN). The data was supplied by SEB and contained 45 variables and 24 635 samples. As the machine learning techniques become increasingly complex to favour enhanced performance, it is often at the expense of the interpretability of the model. An exploratory analysis was therefore made with the objective of measuring variable importance in the machine learning techniques. The findings from the exploratory analysis will be compared to the results from benchmark methods that exist for measuring variable importance. The results of this study shows that logistic regression outperformed the machine learning techniques based on the model performance measure AUC with a score of 0.906. The findings from the exploratory analysis did increase the interpretability of the machine learning techniques and were validated by the results from the benchmark methods.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)