Predicting Customer Churn and Customer Lifetime Value (CLV) using Machine Learning

University essay from Lunds universitet/Matematisk statistik

Abstract: In an evermore competitive environment for companies and business, predictive customer behaviour models can give companies a competitive edge over its competitors. Two such important predictive behaviour models are customer churn models and customer lifetime value (CLV) models. As it is more expensive for companies to acquire new customers rather than retaining existing ones, it is important for business to keep their existing customer base. Customer churn models can assist in retaining existing customers as they can identify patterns in customer engagements and behaviour which increase the risk of churning. These high risk customers can then proactively be targeted with personalized retention strategies. CLV models can assist companies with predicting revenues and identify areas where the company can improve to meet revenue goals. In this thesis, three different popular machine learning algorithms were used to predict customer churn: logistic regression, random forest(RF) and support vector classifier (SVC). Moreover, two different regression algorithms were used to predict CLV: linear regression and support vector regression(SVR). The results showed that the SVC model and the logistic regression model had similar results, with the SVC model having slightly better performance metrics. Moreover, as the feature data was significantly correlated, the logistic regression model might not generalize as well to new data, compared to the SVC model. The random forest model was unstable across different evaluation sets, was to reluctant to classify customers as churned and had overall the worst performance of the three models. For the CLV models, the linear regression model was unable to accurately model the skewed distribution in spending patterns among the customers. Compared to a naive predictor, the linear regression model was only able to outperform in predicting which customer would stop generating revenue. For the customers who did not stop generating revenue, the linear regression model performed significantly worse. The SVR model could more accurately model CLV, outperforming the naive predictor across all ranges except the 1/8:th highest spending customers. The SVR model further significantly outperformed the linear regression model, except for predicting which customers would stop generate revenue, where the linear regression model was slightly better.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)