Detecting Fraudulent User Behaviour : A Study of User Behaviour and Machine Learning in Fraud Detection

University essay from Uppsala universitet/Analys och partiella differentialekvationer

Abstract: This study aims to create a Machine Learning model and investigate its performance of detecting fraudulent user behaviour on an e-commerce platform. The user data was analysed to identify and extract critical features distinguishing regular users from fraudulent users. Two different types of user data were used; Event Data and Screen Data, spanning over four weeks. A Principal Component Analysis (PCA) was applied to the Screen Data to reduce its dimensionality. Feature Engineering was conducted on both Event Data and Screen Data. A Random Forest model, a supervised ensemble method, was used for classification. The data was imbalanced due to a significant difference in number of frauds compared to regular users. Therefore, two different balancing methods were used: Oversampling (SMOTE) and changing the Probability Threshold (PT) for the classification model.  The best result was achieved with the resampled data where the threshold was set to 0,4. The result of this model was a prediction of 80,88% of actual frauds being predicted as such, while 0,73% of the regular users were falsely predicted as frauds. While this result was promising, questions are raised regarding the validity since there is a possibility that the model was over-fitted on the data set. An indication of this was that the result was significantly less accurate without resampling. However, the overall conclusion from the result was that this study shows an indication that it is possible to distinguish frauds from regular users, with or without resampling. For future research, it would be interesting to see data over a more extended period of time and train the model on real-time data to counter changes in fraudulent behaviour.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)