Graphing Customer Churn

Fang Zhou and Wee Hyong Tok have released a case study on a telephone company’s customer churn:

In the case of telco customer churn, we collected a combination of the call detail record data and customer profile data from a mobile carrier, and then followed the data science process —  data exploration and visualization, data pre-processing and feature engineering, model training, scoring and evaluation — in order to achieve the churn prediction. With a churn indicator in the dataset taking value 1 when the customer is churned and taking value 0 when the customer is non-churned, we addressed the problem as a binary classification problem and tried varioustree-based models along with methods like bagging, random forests and boosting. Because the number of churned customers is much less than that of non-churned customers (making the data set quite unbalanced), SMOTE (Synthetic Minority Oversampling Technique) was applied to adjust the proportion of majority class over minority class in the training data set, thus further improving model performance, especially precision and recall.

All the above data science procedures could be implemented with base R. Rather than moving the data out from the database to an external machine running R, we instead run R scripts directly on SQL Server data by leveraging the in-database analytics capability provided by SQL Server R Services, taking advantage of the rich and powerful CRAN R packages plus the parallel external memory algorithms in the RevoScaleR library. In what follows, we will describe the specific R packages and algorithms that we used to implement the data science solution for predicting telco customer churn.

They have provided the relevant materials in GitHub as well.

Related Posts

Building an Image Classifier with PyTorch

Rogier van der Geer shows how you can use PyTorch to build out a Convolutional Neural Network for image classification: The tool that we are going to use to make a classifier is called a convolutional neural network, or CNN. You can find a great explanation of what these are right here on wikipedia. But we […]

Read More

xgboost and Small Numbers of Subtrees

John Mount covers an interesting issue you can run into when using xgboost: While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation).In doing that I ran into one more […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930