Prediction of Short-term Default Probability of Credit Card Invoices Using Behavioural Data

University essay from KTH/Matematisk statistik

Abstract: Probability of Default (PD) is a standard metric to model and monitor credit risk, a major risk facing financial institutions. Traditional PD models are used to forecast risk levels in the long-term, while short-term PD predictions are rarer, but they can support management decisions on an operational level. This thesis investigates the potential usage of short-term PD for credit card invoices within the invoice-to-cash process involving cash-collection activities, such as reminders and calls to customers. A model of this sort enables customized cash-collection efforts that are adapted to different credit card holders. Specifically, the main objectives of this thesis are to examine the usability of machine learning techniques in predicting the short-term default probability of credit card invoices and to investigate what features of credit card holders are important for default prediction. The data set was collected from SEB Kort Bank AB, a payment card company operating in the Nordics, and it consists of overdue credit card invoices with belonging customer behavioural data. Customer behavioural data includes historical purchase patterns, customer information and event variables etc. The data is severely imbalanced with much fewer default invoices than non-default invoices. The features were selected using filter methods and correlation analysis. Several machine learning algorithms, including logistic regression, decision trees, random forest, CatBoost and XGBoost, were tested along with various resampling techniques, such as undersampling and SMOTE to treat class imbalances.  The results were primarily evaluated using Precision-Recall AUC and F-score. The two best-performing models had a Precision-Recall AUC and an F-score of 0.304 and 0.332, respectively. The ROC-AUC was roughly 0.89 for both models. Both models were trained using CatBoost. The results obtained suggest a fair performance for the default class (but superior to a baseline model) and a high performance for the non-default class. Moreover, it was shown that the cut-off probability threshold is a key aspect of classifying an invoice as default or non-default and should be adjusted after preference based on a precision-recall trade-off. Furthermore, feature importance was evaluated using two metrics, i.e, how much on average a prediction changes when the feature changes, and how much the loss value changes when the feature is included or excluded. The main finding in terms of feature importance is that event variables are not critical. The observed important predictive features include credit card balances, card activities, credit utilization and the number of historical invoice payments. Further research is recommended to draw definite conclusions in this regard.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)