A NOTE: THE ONLY WAY TO DO CHURN MODELING

 

Churn is an important concept in most industries and insights around WHEN it is likely to happen and WHY it is likely to happen is strategically lucrative.  Using predictive analytics is key.

First a definition: churn (attrition) is when a customer no longer buys the product.  This can be a subscriber product (telecomm, gym memberships) or non-subscriber product (specialty retail, casual dining, hospitality).  Providing a list of valued customers at risk BY TIME PERIOD is fundamental.  However, there is some confusion around modeling churn.

Churn with a hard date and estimating churn by usage.

A common approach is to do logistic regression by customer and the dependent variable is churn  and the independent variables are typically campaign responses and transactions, maybe demographics, lifestyle, etc.  Then each customer is scored with probability to churn.  This has few insights in that how likely a customer is to churn has no bearing on WHEN the churn will happen.  That is, a customer may be 90% at risk to churn but not for many months.  Note that overlooking time until churn diminishes action-ability.

The next most common (and wrong statistically) is to do logits by month.  That is, do 12 models and the dependent variable is churn in January or not, the next model is churn in February or not, etc.  This will give the probability to churn by month.  It is inappropriate statistically because all models require independence of each other.  That is, the February model is assumed not dependent on January—which is of course false.  The probability to churn in February is absolutely dependent on whether or not the customer churned in January.

This analytic approach solves the time problem, but introduces a worse problem, confusion of independence resulting in spurious correlation.  And BTW ordered logits is also inappropriate in that the sequence of proportionalities is potentially continuous—not to mention interpretation is extremely difficult.

Another approach, trying to incorporate time until the event, is to use ordinary regression and the dependent variable is a counter of time until churn.  This is problematic because a decision has to be made about those that did NOT churn.  What will be done with them?  Delete them?  Give them the shortest churn value?  Give them the longest churn value?  Depending on the percentage of those not churning, each of these three solutions is very poor.

The only appropriate technique in terms of time until an event, taking into account those that had as well as those that did not have the event, is survival modeling.  It was specifically designed for time until event problems (originally time until death).  It also, through partial likelihood techniques, accounts for those that have as well as those that did not have the event.  That means it solves both of the above problems.

So, using survival modeling on churn problems is suggestion number one.  Suggestion number two is to do segmentation by causes of churn and then do a specific (potentially different) model on each segment.

It is critical that the segmentation be about what CAUSES churn, not churn itself.  That is, do not use churn as a segmenting variable.  Churn is a result, not a cause.

Hypothesizing what causes churn is a good first step.  Using telecom as an example what can cause churn?  Dropped calls from a weak network can cause subscribes to change providers.  What else?  A high bill can cause churn.  The high bill can be because of the number of lines and the number of features a subscriber has and the result is a higher than expected bill so they churn.  Another cause of high bills might be high usage, from minutes or data, etc. and this higher than expected bill can cause churn.  Say there are these three segments for simplicity.  Note that each segment has a different cause of churn.

The idea now is to do a survival model for each segment.  There will be three survival models.  One will show dropped calls decreases the time until churn drastically in segment 1, but does not do much to impact segment 2.  This is because segment 2 is not sensitive to dropped calls, but sensitive to a higher bill.  The way to slow down churn in segment 2 is to offer say a discount and an average billing plan, etc.  Note that a discount to the dropped call segment will not likely be very effective.  This is why a causal segmentation is recommended: there are different actions by segment.  Note that if one survival model was applied to everyone, these individual actions would be weaker and diluted.

Note also that not only does this approach provide a time until churn estimate (a list of those at risk ranked by time) by segment, it provides a way to CHANGE that time until churn.  Looking again at segment 2, which is sensitive to high bills, one obvious action is to offer them a discount.  Given that survival modeling is a regression like equation, the amount of discount can be an independent variable.  This means there is a coefficient on how a discount effects the time until churn.  Say a 5% discount tends to push out the time until churn by 6 months.  Now there is an ROI.  It costs 5% and the additional 6 months the subscriber stays on the system is the return.  So an ROI can be calculated and a business case provided.

The above was meant as a simple note on a process that works.  Churn modeling is critical for many industries and deserves appropriate and actionable insights.

 

 

 

Using Segmentation to Improve Both Strategy and Predictive Modeling By Mike Grigsby

INTRODUCTION

We all want to improve the accuracy and insights generated from predictive modeling.  We all like to believe that consumer behavior is predictable.  (Ha!)  The following is a simple philosophy that advocates better predictive models and more actionable strategy comes from segmenting first

By separating consumer behavior into causes that generate strategic insights, better actions can be obtained.  Accuracy of predictive modeling will improve by doing a different model for each segment, rather than one model applied to the whole database.  Thus segmentation makes the models more accurate and generates better insights that cause smarter strategies for each segment.  See Figure 1 below.

FIGURE 1

SEGMENTATION IS A STRATEGIC, NOT AN ANALYTIC, PROCESS

First, be aware that segmentation is about strategy.  Analytics is a part (the most fun part!) of the process.  As mathematics is the handmaiden of science (so said Albert Einstein) so is analytics the handmaiden of strategy.  Analytics without strategy is like the sci-fi action adventure movie with no plot.  (We’ve all seen them!)  There may be explosions and shoot outs and car chases but without a story it has no meaning.

The four Ps of strategic marketing are:

Partition: this is segmentation.  Homogeneous within and heterogeneous between.

Probe: creating new variables, adding on third party overlay data or marketing research.  It fleshes out the segments.

Prioritize: this step uses financial valuations (lifetime value, contribution margin, ROI, etc.) to focus strategy.

Position: after the above, the four Ps of tactical marketing (product price promotion and place) are levied differently against each segment to extract the most value.  Each segment requires a different strategy (that is why they are segments).

Note that segmentation is the first of the four Ps.  The bottom line is that the more differentiated the segments are the more actionable the strategy can be.

 

WHICH ALGORITHM?

Those who have read my earlier works know I advocate latent class analysis as the state of the art in segmentation.  K-means is probably LCA’s closest competitor, although SVM is catching up, mostly because it is frtee using R or python, etc..  But, as stated, LCA offers superior performance.  This is for several reasons:

  • Latent class does not require the analyst to state the number of segments to find, unlike K-means. LCA tells the analyst the optimal number of segments.
  • Latent class does not require the analyst to dictate which variables are used for segment definition, again unlike K-means. LCA tells the analyst which variables are significant.

In short, unlike K-means, there are no arbitrary decisions the analyst needs to make.  The LCA process finds the optimal solution.

  • Latent class maximizes the probability of observing the scores seen on the segmenting variables, hypothesized to come from membership in a (latent) segment. That is, LCA is a probabilistic model.
  • K-means uses the square root of the Euclidean distance of each segmenting variable to define segment membership. K-means does not optimize anything; it is only a mathematical business rule.

 

WHY WOULD SEGMENTATION IMPROVE PREDICTIVE MODELING ACCURACY?

Segmentation will improve modeling accuracy because instead of one overall (on average) model there will be a different model for each segment.  The different granularities cause a smaller error.

It’s very possible (because it is a different model) to have different variables in each model.  The example below is meant to illustrate just that.  This also leads to the additional insights.  See figure 2.  The simple answer is that with one model the dependent variable is on average say 100, plus/minus 75.  But with three models (one for each segment) the dependent variable is 50 plus/minus 25, 100 plus/minus 25 and 150 plus/minus 25.  Of course accuracy will be much better.

FIGURE 2

 

SEGMENTING VARIABLES FOR MODEL IMPROVEMENT

For segmenting variables use causal, not resulting, variables.  That is, if you are doing a demand model where units are the dependent variable, the segmentation should be based on things that cause demand to move, NOT demand itself.  That is, use sensitivity to discounts, marcomm stimulation, seasonality, competitive pressure, etc.  Do not segment based on revenue or units, these are resulting variables, these are the things you are trying to impact.

After segmenting, elasticity can be calculated, market basket penetration can be ascertained and marketing communication valuation (even media mix modeling) can be done for each segment.  Imagine the insights!  Then a different demand model for each segment can be done.

 

EXAMPLE: CHURN MODELING

First a little background, both on churn modeling and survival analysis.  Churn (attrition) is a common metric in marketing analytics and there is usually a strategy about how to combat churn.  The analytic technique is called survival modeling.

Say we have data from a telecomm firm that wants to understand causes of churn and strategies to slow down churn.  The solution will be to first segment subscribers based on causes of churn and then to do a different survival model for each segment.  There should be a different strategy for each segment based on different sensitivities to each cause.

Survival modeling became popular in the early 1970s based on proportional hazards, called Cox regression, and in SAS is proc PHREG.  That is a non-parametric approach but the dependent variable is the hazard rate and is difficult to interpret and very difficult to explain to a client.  Most marketing analysts use a parametric approach (in SAS proc LIFEREG).  Lifereg has the dependent variable ln(time until the event) where the event is churn.

Survival modeling came out of biostatistics and has become very powerful now in marketing.  Survival modeling is a technique specifically designed to study time until event problems.  In marketing this often means time until churn, but can also be time until response, time until purchase, etc.

The power of survival modeling comes from two sources: 1) a prediction of time until churn can be calculated for each subscriber and 2) because it is a regression equation there are independent variables that will tell how to increase / decrease time until churn for every subscriber.  This will develop personalized strategies for each subscriber.

So, TABLE 1 shows a simple segmentation, with three segments.  The mean values are shown (as KPIs) for each segment as a general profile.  The segmenting variables were discount amount, things that impact price (data, minutes, features, phones, etc.), IVR, dropped calls, income, size of household, etc.  Note that percent of churn was NOT a segmenting variable.

 

Segment 1

The largest segment at 48% subscribers but only brings in 7% of the revenue.  They are either an opportunity or should be DEmarketed.  This segment has the shortest tenure, fewest features, most on a billing plan, fewest minutes, etc., pays mostly by check and NOT credit card, is not marketed to but responds the most.  They seem to be looking for a deal.  They use the most discounts (when the get them) lowest household income, are the youngest with the least education.  They are probably sensitive to price and that causes them to churn, which they do more than the other segments at 44% after only 94.1 days.

 

TABLE 1

KPIs SEGMENT1 SEGMENT2 SEGMENT3
   
% Subscribers 48% 29% 22%
% Revenue 7% 30% 63%
Average Bill $86 $188 $282
Tenure (Days) 145 298 401
# Features 0.8 2.1 4.9
# Phones 2.1 2.3 1.4
# IVR Minutes 88 14 6
# Dropped Calls 2.1 4.5 1.9
% Billing Plan 74% 49% 22%
Total Minutes 98 168 244
Total Data 145 225 354
# Pmts CC 11.9 10.7 1.1
# Pmts Chk 2.3 4.4 9.9
# Emails Sent 2.2 4.6 10.9
# SMS Sent 0.5 1.1 8.9
% Response 27% 19% 11%
Avg Discount 12% 8% 2%
HH Income $39,877 $74,555 $188,787
Size HH 3.1 2.8 1.8
Avg Age 29 35 44
Education 11 13 17
% churn 44% 39% 21%
avg TT Churn 94.1 275.1 388.2

 

Segment 2

29% of the subscribers bring in 30% of the revenue, so they do pretty much their fair share.  They have the most phones and the largest households but get the most dropped calls. 39% of the subscribers churn on average 9 months after subscribing.

 

Segment 3

The smallest segment, at 22% brings in a whopping 63% of the revenue.  They are loyal and satisfied, buy the most features and keep coming back.  They do not have a lot of phones because they have the smallest households.  They basically do not use IVR and only 22% are on a billing plan. They have the highest education and household income and are mostly middle-aged.  They do not use much discount and pretty much ignore marcomm, even though they are sent the majority of communications.  It takes them over a year to attrite and only 21% do.

 

INTERPRETATION AND INSIGHTS

The insights come from the model output that drives the strategies.  See Table 2.  This shows on the left side the coefficients resulting from a churn model for each segment.  If the cell is blank it is because that variable for that segment model was insignificant.

Interpretation of survival coefficients is relatively straightforward.  The dependent variable is ln(TTE), the natural log of time until the event and in this case the event is attrition.  So the coefficients tell both direction and strength of TTE.  If the coefficient is positive an increase in that variable will push OUT the TTE, if the coefficient is negative an increase in that variable will pull IN the TTE.

As an example, take the independent variable number of features, (# features).  This indicates how many features on the phone each subscriber has.  To interpret the coefficient: e^B -1 * mean.  That is, for segment 1, (((e^-0.055) – 1) * 94.1) = -5.04.  This means, for every additional feature a subscriber in segment 1 has, their time until churn goes down (decreases) by 5.04 days.  Or from 94.1 to 89.06.

For segment 3, (((e^0.057) – 1) * 388.2) = 22.77.  This means, for every additional feature a subscriber in segment 3 has, their time until churn goes out (increases) by 22.77 days.  Or from 388.2 to 410.97.

The insights from this one product variable might indicate segment 1 is sensitive to price and things that cause price (their bill) to increase.  As price tends to increases these subscribers tend to churn.  Note for example this segment has the smallest income and least education.  Likewise segment 3 seems to be brand loyal.  As they get more features they tend to stay a subscriber longer, but barely by adding only 0.39 days.  Obviously an ROPI can be calculated using these insights.

 

TABLE 2

COEFF SEGMENT1 SEGMENT2 SEGMENT3 E^B -1 * mean SEGMENT1 SEGMENT2 SEGMENT3
       
Tenure 0.022 0.035 0.057 Tenure 2.09 9.80 22.77
# features -0.055 -0.033 0.001 # features -5.04 -8.93 0.39
# phones -0.066 -0.002 0.002 # phones -6.01 -0.55 0.78
# IVR minutes 0.250 0.005   # IVR minutes 26.73 1.38  
# drop calls -0.002 -0.720 -0.001 # drop calls -0.19 -141.19 -0.39
% billing plan 0.290 0.006 0.004 % billing plan 31.66 1.66 1.56
Total minutes -0.300 -0.020 0.050 Total minutes -24.39 -5.45 19.90
Total data -0.220 -0.050 0.033 Total data -18.58 -13.42 13.02
# Pmts CC 0.010 0.002   # Pmts CC 0.95 0.55
# Pmts chk 0.005 0.003 # Pmts chk 1.38 1.17
# emails sent 0.035 0.002 -0.004 # emails sent 3.35 0.55 -1.55
# SMS sent 0.008 0.005   # SMS sent 0.76 1.38
avg discount 0.312 0.020 0.001 avg discount 34.46 5.56 0.39
           
HH income 0.010 -0.030 0.020 HH income 0.95 -8.13 7.84
Size HH -0.280 0.001 0.003 Size HH -22.98 0.33 1.17
Age 0.056   Age 5.42
Education 0.067 0.012 -0.001 Education 6.52 3.32 -0.39
AVG TT CHURN 94.1 275.1 388.2  
 
     
           
   
   
   
   
   
   

Lastly, take discounts.  Here a direct ROI can be calculated.  If the firm gives an X % discount to subscribers in a segment that will result in a Y increase in TT churn.  That increase in churn can be valued at the current bill rate.

For segment 1, which has an average discount rate of 12%, if that was increased to 13% the TT churn goes out by 34.46 days.  Clearly this segment is very sensitive to price.  These would not be known unless a segmentation and churn model was implemented.  Conversely segment 3 is not sensitive to price and is very brand loyal.  If the discount went from 2% to 3% the TT churn would only go out by 0.39 days.  They will take the discount but it is nearly irrelevant.

Note also that segment 2 seems sensitive to dropped calls.  Segment 1 and segment 3 seem to not be sensitive to dropped calls.  This knowledge allows a strategy specifically aimed at segment 2.

 

WHAT IF THERE WAS NO SEGMENTATION?

The point of all the above was to demonstrate how segmentation drives more insights for strategy and more accuracy for modeling and better action-ability.  What if only one model were developed overall, instead of by-segment?

Let’s look at the variable # features.  One model will have a coefficient of -0.04.  Note it is negative, on average, which means as number of features increases the time until churn decreases, come in.  Strategically this would indicate not upselling to subscribers more features, because time until churn is shorter.  Of course this is the wrong decision for segment 3, more features and they are happier and more loyal and the best customers stay on the database longer and keep paying.  That is, doing one model for the whole database would give the wrong indication to those that drive 63% of the revenue.

Another simple example, number of emails sent.  Same argument, for segments 1 and 2 as more emails are sent the TT churn goes out.  But for sent 3, more emails cause emails fatigue and the TT churn comes in.  This is an important strategic insight: do NOT send more emails to the very loyal segment.  They do not need they, they are an irritation and tend to cause churn.  Again, this insight would not be found without doing segmentation first.

 

CONCLUSION

Segmentation should be seen as a strategic process, not an analytic one.  Segmentation has uses other than merely to separate the market into homogeneous within and heterogeneous between classifications.  Segmentation can also be used to make predictive modeling more accurate and achieve more actionable strategic insights.  And it’s fun!