Machine Learning in Banking : Exploring the feasibility of using consumer level bank transaction data for credit risk evaluation
Abstract: The financial industry is changing rapidly as a result of the increasing digitization of financial and economic resources and services. With a continuous increase in online payments and decrease in the usage of physical currency a new data source of fine-grained payment activities describing consumer behaviour has emerged. In the banking industry, this is an information source which has not yet been utilized to its full extent. The possibility of converting this data to meaningful information has the potential use in improving credit risk modelling and loan application screening.This work explore the feasibility of using transaction data for credit risk assessment by means of evaluating the correlation between financial behaviour derived from account statistics, and credit risk classification using the XGBoost machine learning algorithm. The XGBoost models were trained using a real world data set from a large Swedish bank consisting of 40 million raw transactions made by a random sample of 30000 individuals which have all been granted private consumer-level loans between 20000-350000 SEK without security.The results show that there exists a correlation between financial behaviour and credit risk classification. Payment frequency and general account balance statistics were identified as the primary drivers for risk classification decisions. Intra-monthly features resulted in the best performance for models trained on 5 risk classes as well as 2 risk classes (lowest and highest) reaching macroaverage ROC-AUC scores of 0.8 and 0.82, and macro-average f1-scores of0.39 and 0.79 respectively. Further investigation has been deemed necessary to determine if the correlation found implies causation..
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)