Peeking Through the Leaves : Improving Default Estimation with Machine Learning : A transparent approach using tree-based models

University essay from Umeå universitet/Institutionen för matematik och matematisk statistik

Abstract: In recent years the development and implementation of AI and machine learning models has increased dramatically. The availability of quality data paving the way for sophisticated AI models. Financial institutions uses many models in their daily operations. They are however, heavily regulated and need to follow the regulation that are set by central banks auditory standard and the financial supervisory authorities. One of these standards is the disclosure of expected credit losses in financial statements of banks, called IFRS 9. Banks must measure the expected credit shortfall in line with regulations set up by the EBA and FSA. In this master thesis, we are collaborating with a Swedish bank to evaluate different machine learning models to predict defaults of a unsecured credit portfolio. The default probability is a key variable in the expected credit loss equation. The goal is not only to develop a valid model to predict these defaults but to create and evaluate different models based on their performance and transparency. With regulatory challenges within AI the need to introduce transparency in models are part of the process. When banks use models there’s a requirement on transparency which refers to of how easily a model can be understood with its architecture, calculations, feature importance and logic’s behind the decision making process. We have compared the commonly used model logistic regression to three machine learning models, decision tree, random forest and XG boost. Where we want to show the performance and transparency differences of the machine learning models and the industry standard. We have introduced a transparency evaluation tool called transparency matrix to shed light on the different transparency requirements of machine learning models. The results show that all of the tree based machine learning models are a better choice of algorithm when estimating defaults compared to the traditional logistic regression. This is shown in the AUC score as well as the R2 metric. We also show that when models increase in complexity there is a performance-transparency trade off, the more complex our models gets the better it makes predictions.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)