Understanding The Problem: Churn Edition

Emre Yazici points out the importance and difficulty of nailing down good definitions, using bookings churn as an example:

WHEN: Let’s say, we have made our design, constructed a model and obtained a good accuracy. However our model predicts (even with 95% accuracy) the customer who are going to churn in next day! That means our business department have to prevent (somehow, as explained before) those customers to churn in “one day”. Because next day, they will not be our customers. Taking an action to “3000” customers (let’s say) in one day only is impossible. So even our project predicts with very high accuracy, it will not be usefull. This approach also creates another problem: Consider that N months ago, a customer “A” was a happy customer and was working (providing us) with us (let’s say, it is a customer with %100 efficiency – happiness) and tomorrow it will be a customer who is not working with us (a customer with %100 efficiency – happiness). And we can predict the result today. So most probably, the customer has already got the idea to leave from our company in the last day. This is a deadend and we can not prevent the customer to churn at this point – because it is already too late.

So we need to have a certain time limit… Such that we need to be able to warn the business department “M months” before (customer churn) thus they can take action before the customers leave. Here comes another problem, what is the time limit… 2 months, 2.5 months, 3 months…? How do we determine the time, that we need to predict customers churn before (they leave)?

There’s a lot more to a good solution than “I ran a regression against a data set.”

Related Posts

Interpreting P-Value Histograms

David Robinson visualizes and interprets different p-value histograms: So you’re a scientist or data analyst, and you have a little experience interpreting p-values from statistical tests. But then you come across a case where you have hundreds, thousands, or even millions of p-values. Perhaps you ran a statistical test on each gene in an organism, or on […]

Read More

The Magic Of Sampling

Nathan LeClaire reminds us of an important story that statisticians have been telling us for a couple centuries: It starts slowly. Maybe your home-grown centralized logging cluster becomes more difficult to operate, demanding unholy amounts of engineer time every week. Maybe engineers start to find that making a query about production is a “go get […]

Read More


April 2017
« Mar May »