Classification of Premium and Non-Premium Products using XGBoost and Logistic Regression

University essay from Lunds universitet/Statistiska institutionen; Lunds universitet/Nationalekonomiska institutionen

Abstract: In the past few years, many industries have become interested in premium product segmentation to achieve higher unit margins. In this paper, we applied machine learning algorithms to predict whether a product is premium or non-premium. This product is manufactured by a food and beverage company that considers the incorrect classification of products as their primary concern, especially when incorrectly predicting premium products (False Positives). Therefore, the focus of this study is to minimize the misclassification of premium products. We selected Logistic Regression (LR) and XGBoost (XGB) and applied balancing methods, feature selection, and tuning parameters. The main contribution of this research is the application of a Cost-Sensitive (CS) analysis for addressing misclassification with a highly imbalanced dataset. According to our results, the model with the best performance was CS-XGB-SMOTE achieving a False Positive Rate (FPR) of 2.7%. A more robust way to assign the costs for the CS analysis and a direct modification of the loss function for XGB can be explored for future research and may improve the performance of this algorithm.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)