Graphing Customer Churn

Fang Zhou and Wee Hyong Tok have released a case study on a telephone company’s customer churn:

In the case of telco customer churn, we collected a combination of the call detail record data and customer profile data from a mobile carrier, and then followed the data science process —  data exploration and visualization, data pre-processing and feature engineering, model training, scoring and evaluation — in order to achieve the churn prediction. With a churn indicator in the dataset taking value 1 when the customer is churned and taking value 0 when the customer is non-churned, we addressed the problem as a binary classification problem and tried varioustree-based models along with methods like bagging, random forests and boosting. Because the number of churned customers is much less than that of non-churned customers (making the data set quite unbalanced), SMOTE (Synthetic Minority Oversampling Technique) was applied to adjust the proportion of majority class over minority class in the training data set, thus further improving model performance, especially precision and recall.

All the above data science procedures could be implemented with base R. Rather than moving the data out from the database to an external machine running R, we instead run R scripts directly on SQL Server data by leveraging the in-database analytics capability provided by SQL Server R Services, taking advantage of the rich and powerful CRAN R packages plus the parallel external memory algorithms in the RevoScaleR library. In what follows, we will describe the specific R packages and algorithms that we used to implement the data science solution for predicting telco customer churn.

They have provided the relevant materials in GitHub as well.

Related Posts

Validating SSIS Packages Using T-SQL

Annie Xu shows us how to validate SSIS packages in the SSISDB catalog using T-SQL: Recently, I need to do a data warehouse migration for a client. Since there might be some difference between the Dev environment source databases and Prod environment source databases. The migrated SSIS packages for building data warehouse might have some […]

Read More

Data Science And Data Engineering In HDP 3.0

Saumitra Buragohain, et al, show off some of the things added to the Hortonworks Data Platform for data scientists and data engineers: We leverage the power of HDP 3.0 from efficient storage (erasure coding), GPU pooling to containerized TensorFlow and Zeppelin to enable this use case. We will the save the details for a different […]

Read More


September 2016
« Aug Oct »