Graphing Customer Churn

Fang Zhou and Wee Hyong Tok have released a case study on a telephone company’s customer churn:

In the case of telco customer churn, we collected a combination of the call detail record data and customer profile data from a mobile carrier, and then followed the data science process —  data exploration and visualization, data pre-processing and feature engineering, model training, scoring and evaluation — in order to achieve the churn prediction. With a churn indicator in the dataset taking value 1 when the customer is churned and taking value 0 when the customer is non-churned, we addressed the problem as a binary classification problem and tried varioustree-based models along with methods like bagging, random forests and boosting. Because the number of churned customers is much less than that of non-churned customers (making the data set quite unbalanced), SMOTE (Synthetic Minority Oversampling Technique) was applied to adjust the proportion of majority class over minority class in the training data set, thus further improving model performance, especially precision and recall.

All the above data science procedures could be implemented with base R. Rather than moving the data out from the database to an external machine running R, we instead run R scripts directly on SQL Server data by leveraging the in-database analytics capability provided by SQL Server R Services, taking advantage of the rich and powerful CRAN R packages plus the parallel external memory algorithms in the RevoScaleR library. In what follows, we will describe the specific R packages and algorithms that we used to implement the data science solution for predicting telco customer churn.

They have provided the relevant materials in GitHub as well.

Related Posts

An Introduction To seplyr

John Mount guest blogs on the Revolutions blog about seplyr: seplyr is an R package that supplies improved standard evaluation interfaces for many common data wrangling tasks. The core of seplyr is a re-skinning of dplyr‘s functionality to seplyr conventions (similar to how stringr re-skins the implementing package stringi). Read on for a couple of examples of where seplyr can make it easier for you to […]

Read More

Creating Dynamic Pivot Tables

Ben Richardson shows how to use dynamic SQL to create pivot tables with arbitrary numbers of pivot elements: The headings of the columns are the individual values inside the city column. We specified these values inside the pivot operator in our query. The most tedious part of creating pivot tables is specifying the values for […]

Read More


September 2016
« Aug Oct »