First cycle, 15 credits Machine Learning based Clustering of Bank Card Consumers : Identification of risk groups for fraud detection purposes

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Jacob Hernberg; Ali Cicek; [2022]

Keywords: ;

Abstract: To safeguard consumers, banks have developed machine learning based fraud detections systems which work to prevent fraudulent card transactions from occurring. The goal of this report is to improve these systems by trying to segment consumers into different risk groups. The hypothesis is that by finding these groups one can find which types of consumers are more likely to be hit fraud and which features that distinguishes these consumers. Initially, the transaction history of 6000 consumers were aggregated into feature vectors describing the consumers’ profile and behavioral patterns during the past year. The features were related to transaction amounts, number of transactions, merchant categories, time of purchase, age, transaction decline rate and pos entry mode. K-means was used to cluster consumers into one of three segments. Lastly, the segments were analyzed with principal component analysis and the defining features of the segments were identified. The result of this clustering found that consumers in cluster 2 were three times more likely to be hit by fraud than consumers in cluster 1. However, these clusters were not distinctly separated from each other which opposes the idea that distinct consumer groups can be found in the dataset and, in turn, that risk groups can be defined. Furthermore, the result showed that individual attributes alone had negligible impact in predicting fraud, although a combination of attributes such as number of unique merchants, number of transactions and proportion of transactions with pos entry mode contactless and online correlated with higher amounts of fraud.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)