Prediction of Lead Conversion With Imbalanced Data : A method based on Predictive Lead Scoring

University essay from Linköpings universitet/Statistik och maskininlärning

Abstract: An ongoing challenge for most businesses is to filter out potential customers from their audience. This thesis proposes a method that takes advantage of user data to classify po- tential customers from random visitors to a website. The method is based on the Predictive Lead Scoring method that segments customers based on their likelihood of purchasing a product. Our method, however, aims to predict user conversion, that is predicting whether a user has the potential to become a customer or not. Six supervised machine learning models have been used to carry out the classifica- tion task. To account for the high imbalance in the input data, multiple resampling meth- ods have been applied to the training data. The combination of classifier and resampling method with the highest average precision score has been selected as the best model. In addition, this thesis tries to quantify the effect of feature weights by evaluating some feature ranking and weighting schemes. Using the schemes, several sets of weights have been produced and evaluated by training a KNN classifier on the weighted features. The change in average precision obtained from the original KNN (without weighting) is used as the reference for measuring the performance of ranking and weighting schemes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)