Churn is an important concept in most industries and insights around WHEN it is likely to happen and WHY it is likely to happen is strategically lucrative. Using predictive analytics is key.

First a definition: churn (attrition) is when a customer no longer buys the product. This can be a subscriber product (telecomm, gym memberships) or non-subscriber product (specialty retail, casual dining, hospitality). Providing a list of valued customers at risk BY TIME PERIOD is fundamental. However, there is some confusion around modeling churn.

Churn with a hard date and estimating churn by usage.

A common approach is to do logistic regression by customer and the dependent variable is churn and the independent variables are typically campaign responses and transactions, maybe demographics, lifestyle, etc. Then each customer is scored with probability to churn. This has few insights in that how likely a customer is to churn has no bearing on WHEN the churn will happen. That is, a customer may be 90% at risk to churn but not for many months. Note that overlooking time until churn diminishes action-ability.

The next most common (and wrong statistically) is to do logits by month. That is, do 12 models and the dependent variable is churn in January or not, the next model is churn in February or not, etc. This will give the probability to churn by month. It is inappropriate statistically because all models require independence of each other. That is, the February model is assumed not dependent on January—which is of course false. The probability to churn in February is absolutely dependent on whether or not the customer churned in January.

This analytic approach solves the time problem, but introduces a worse problem, confusion of independence resulting in spurious correlation. And BTW ordered logits is also inappropriate in that the sequence of proportionalities is potentially continuous—not to mention interpretation is extremely difficult.

Another approach, trying to incorporate time until the event, is to use ordinary regression and the dependent variable is a counter of time until churn. This is problematic because a decision has to be made about those that did NOT churn. What will be done with them? Delete them? Give them the shortest churn value? Give them the longest churn value? Depending on the percentage of those not churning, each of these three solutions is very poor.

The only appropriate technique in terms of time until an event, taking into account those that had as well as those that did not have the event, is survival modeling. It was specifically designed for time until event problems (originally time until death). It also, through partial likelihood techniques, accounts for those that have as well as those that did not have the event. That means it solves both of the above problems.

So, using survival modeling on churn problems is suggestion number one. Suggestion number two is to do segmentation by causes of churn and then do a specific (potentially different) model on each segment.

It is critical that the segmentation be about what CAUSES churn, not churn itself. That is, do not use churn as a segmenting variable. Churn is a result, not a cause.

Hypothesizing what causes churn is a good first step. Using telecom as an example what can cause churn? Dropped calls from a weak network can cause subscribes to change providers. What else? A high bill can cause churn. The high bill can be because of the number of lines and the number of features a subscriber has and the result is a higher than expected bill so they churn. Another cause of high bills might be high usage, from minutes or data, etc. and this higher than expected bill can cause churn. Say there are these three segments for simplicity. Note that each segment has a different cause of churn.

The idea now is to do a survival model for each segment. There will be three survival models. One will show dropped calls decreases the time until churn drastically in segment 1, but does not do much to impact segment 2. This is because segment 2 is not sensitive to dropped calls, but sensitive to a higher bill. The way to slow down churn in segment 2 is to offer say a discount and an average billing plan, etc. Note that a discount to the dropped call segment will not likely be very effective. This is why a causal segmentation is recommended: there are different actions by segment. Note that if one survival model was applied to everyone, these individual actions would be weaker and diluted.

Note also that not only does this approach provide a time until churn estimate (a list of those at risk ranked by time) by segment, it provides a way to CHANGE that time until churn. Looking again at segment 2, which is sensitive to high bills, one obvious action is to offer them a discount. Given that survival modeling is a regression like equation, the amount of discount can be an independent variable. This means there is a coefficient on how a discount effects the time until churn. Say a 5% discount tends to push out the time until churn by 6 months. Now there is an ROI. It costs 5% and the additional 6 months the subscriber stays on the system is the return. So an ROI can be calculated and a business case provided.

The above was meant as a simple note on a process that works. Churn modeling is critical for many industries and deserves appropriate and actionable insights.