Binary Classification for Predicting Customer Churn

University essay from Umeå universitet/Institutionen för matematik och matematisk statistik

Abstract: Predicting when a customer is about to turn to a competitor can be difficult, yet extremely valuable from a business perspective. The moment a customer stops being considered a customer is known as churn, a widely researched topic in several industries when dealing with subscription-services. However, in industries with non-subscription services and products, defining churn can be a daunting task and the existing literature does not fully cover this field. Therefore, this thesis can be seen as a contribution to current research, specially when not having a set definition for churn. A definition for churn, adjusted to DIAKRIT’s business, is created. DIAKRIT is a company working in the real estate industry, which faces many challenges, such as a huge seasonality. The prediction was approached as a supervised problem, where three different Machine Learning methods were used: Logistic Regression, Random Forest and Support Vector Machine. The variables used in the predictions are predominantly activity data. With a relatively high accuracy and AUC-score, Random Forest was concluded to be the most reliable model. It is however clear that the model cannot separate between the classes perfectly. It was also visible that the Random Forest model produces a relatively high precision. Thereby, it can be settled that even though the model is not flawless the customers predicted to churn are very likely to churn.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)