Predicting customer churn in telecommunications service

University essay from Luleå/Business Administration and Social Sciences

Abstract: Customer churn is the focal concern of most companies which are active in
industries with low switching cost. Among all industries which suffer from
this issue, telecommunications industry can be considered in the top of the
list with approximate annual churn rate of 30%. Tackling this problem,
there exist different approaches via developing predictive models for
customers churn, but due to the nature of pre-paid mobile telephony market
which is not contract-based, customer churn is not easily traceable and
definable, thus constructing a predictive model would be of high
complexity. Handling this issue, in this study, we developed a dual-step
model building approach, which consists of clustering phase and
classification phase. With this regard firstly, the customer base was
divided into four clusters, based on their RFM related features, with the
aim of extracting a logical definition of churn, and secondly, based on the
churn definitions that were extracted in the first step, we conducted the
second step which was the model building phase. In the model building phase
firstly the Decision Tree (CART algorithm) was utilized in order to build
the predictive model, afterwards with the aim of comparing the performance
of different algorithms, Neural Networks algorithm and different algorithms
of Decision Tree were utilized to construct the predictive models for churn
in our developed clusters. Evaluating and comparing the performance of the
employed algorithms based on “Gain measure”, we concluded that employing a
multi-algorithm approach in which different algorithms are used for
different clusters, can bring the maximum “Gain” among the tested
Furthermore, dealing with our imbalanced dataset, we tested the cost-
sensitive learning method as a remedy for handling the class imbalance.
Regarding the results, both simple and cost-sensitive predictive models
have a considerable higher performance than random sampling in both CART
model and multi-algorithm model. Additionally, according to our study, cost-
sensitive learning was proved to outperform the simple model only in CART
model but not in the multi-algorithm.