Using Segmentation to Improve Both Strategy and Predictive Modeling By Mike Grigsby


We all want to improve the accuracy and insights generated from predictive modeling.  We all like to believe that consumer behavior is predictable.  (Ha!)  The following is a simple philosophy that advocates better predictive models and more actionable strategy comes from segmenting first

By separating consumer behavior into causes that generate strategic insights, better actions can be obtained.  Accuracy of predictive modeling will improve by doing a different model for each segment, rather than one model applied to the whole database.  Thus segmentation makes the models more accurate and generates better insights that cause smarter strategies for each segment.  See Figure 1 below.



First, be aware that segmentation is about strategy.  Analytics is a part (the most fun part!) of the process.  As mathematics is the handmaiden of science (so said Albert Einstein) so is analytics the handmaiden of strategy.  Analytics without strategy is like the sci-fi action adventure movie with no plot.  (We’ve all seen them!)  There may be explosions and shoot outs and car chases but without a story it has no meaning.

The four Ps of strategic marketing are:

Partition: this is segmentation.  Homogeneous within and heterogeneous between.

Probe: creating new variables, adding on third party overlay data or marketing research.  It fleshes out the segments.

Prioritize: this step uses financial valuations (lifetime value, contribution margin, ROI, etc.) to focus strategy.

Position: after the above, the four Ps of tactical marketing (product price promotion and place) are levied differently against each segment to extract the most value.  Each segment requires a different strategy (that is why they are segments).

Note that segmentation is the first of the four Ps.  The bottom line is that the more differentiated the segments are the more actionable the strategy can be.



Those who have read my earlier works know I advocate latent class analysis as the state of the art in segmentation.  K-means is probably LCA’s closest competitor, although SVM is catching up, mostly because it is frtee using R or python, etc..  But, as stated, LCA offers superior performance.  This is for several reasons:

  • Latent class does not require the analyst to state the number of segments to find, unlike K-means. LCA tells the analyst the optimal number of segments.
  • Latent class does not require the analyst to dictate which variables are used for segment definition, again unlike K-means. LCA tells the analyst which variables are significant.

In short, unlike K-means, there are no arbitrary decisions the analyst needs to make.  The LCA process finds the optimal solution.

  • Latent class maximizes the probability of observing the scores seen on the segmenting variables, hypothesized to come from membership in a (latent) segment. That is, LCA is a probabilistic model.
  • K-means uses the square root of the Euclidean distance of each segmenting variable to define segment membership. K-means does not optimize anything; it is only a mathematical business rule.



Segmentation will improve modeling accuracy because instead of one overall (on average) model there will be a different model for each segment.  The different granularities cause a smaller error.

It’s very possible (because it is a different model) to have different variables in each model.  The example below is meant to illustrate just that.  This also leads to the additional insights.  See figure 2.  The simple answer is that with one model the dependent variable is on average say 100, plus/minus 75.  But with three models (one for each segment) the dependent variable is 50 plus/minus 25, 100 plus/minus 25 and 150 plus/minus 25.  Of course accuracy will be much better.




For segmenting variables use causal, not resulting, variables.  That is, if you are doing a demand model where units are the dependent variable, the segmentation should be based on things that cause demand to move, NOT demand itself.  That is, use sensitivity to discounts, marcomm stimulation, seasonality, competitive pressure, etc.  Do not segment based on revenue or units, these are resulting variables, these are the things you are trying to impact.

After segmenting, elasticity can be calculated, market basket penetration can be ascertained and marketing communication valuation (even media mix modeling) can be done for each segment.  Imagine the insights!  Then a different demand model for each segment can be done.



First a little background, both on churn modeling and survival analysis.  Churn (attrition) is a common metric in marketing analytics and there is usually a strategy about how to combat churn.  The analytic technique is called survival modeling.

Say we have data from a telecomm firm that wants to understand causes of churn and strategies to slow down churn.  The solution will be to first segment subscribers based on causes of churn and then to do a different survival model for each segment.  There should be a different strategy for each segment based on different sensitivities to each cause.

Survival modeling became popular in the early 1970s based on proportional hazards, called Cox regression, and in SAS is proc PHREG.  That is a non-parametric approach but the dependent variable is the hazard rate and is difficult to interpret and very difficult to explain to a client.  Most marketing analysts use a parametric approach (in SAS proc LIFEREG).  Lifereg has the dependent variable ln(time until the event) where the event is churn.

Survival modeling came out of biostatistics and has become very powerful now in marketing.  Survival modeling is a technique specifically designed to study time until event problems.  In marketing this often means time until churn, but can also be time until response, time until purchase, etc.

The power of survival modeling comes from two sources: 1) a prediction of time until churn can be calculated for each subscriber and 2) because it is a regression equation there are independent variables that will tell how to increase / decrease time until churn for every subscriber.  This will develop personalized strategies for each subscriber.

So, TABLE 1 shows a simple segmentation, with three segments.  The mean values are shown (as KPIs) for each segment as a general profile.  The segmenting variables were discount amount, things that impact price (data, minutes, features, phones, etc.), IVR, dropped calls, income, size of household, etc.  Note that percent of churn was NOT a segmenting variable.


Segment 1

The largest segment at 48% subscribers but only brings in 7% of the revenue.  They are either an opportunity or should be DEmarketed.  This segment has the shortest tenure, fewest features, most on a billing plan, fewest minutes, etc., pays mostly by check and NOT credit card, is not marketed to but responds the most.  They seem to be looking for a deal.  They use the most discounts (when the get them) lowest household income, are the youngest with the least education.  They are probably sensitive to price and that causes them to churn, which they do more than the other segments at 44% after only 94.1 days.



% Subscribers 48% 29% 22%
% Revenue 7% 30% 63%
Average Bill $86 $188 $282
Tenure (Days) 145 298 401
# Features 0.8 2.1 4.9
# Phones 2.1 2.3 1.4
# IVR Minutes 88 14 6
# Dropped Calls 2.1 4.5 1.9
% Billing Plan 74% 49% 22%
Total Minutes 98 168 244
Total Data 145 225 354
# Pmts CC 11.9 10.7 1.1
# Pmts Chk 2.3 4.4 9.9
# Emails Sent 2.2 4.6 10.9
# SMS Sent 0.5 1.1 8.9
% Response 27% 19% 11%
Avg Discount 12% 8% 2%
HH Income $39,877 $74,555 $188,787
Size HH 3.1 2.8 1.8
Avg Age 29 35 44
Education 11 13 17
% churn 44% 39% 21%
avg TT Churn 94.1 275.1 388.2


Segment 2

29% of the subscribers bring in 30% of the revenue, so they do pretty much their fair share.  They have the most phones and the largest households but get the most dropped calls. 39% of the subscribers churn on average 9 months after subscribing.


Segment 3

The smallest segment, at 22% brings in a whopping 63% of the revenue.  They are loyal and satisfied, buy the most features and keep coming back.  They do not have a lot of phones because they have the smallest households.  They basically do not use IVR and only 22% are on a billing plan. They have the highest education and household income and are mostly middle-aged.  They do not use much discount and pretty much ignore marcomm, even though they are sent the majority of communications.  It takes them over a year to attrite and only 21% do.



The insights come from the model output that drives the strategies.  See Table 2.  This shows on the left side the coefficients resulting from a churn model for each segment.  If the cell is blank it is because that variable for that segment model was insignificant.

Interpretation of survival coefficients is relatively straightforward.  The dependent variable is ln(TTE), the natural log of time until the event and in this case the event is attrition.  So the coefficients tell both direction and strength of TTE.  If the coefficient is positive an increase in that variable will push OUT the TTE, if the coefficient is negative an increase in that variable will pull IN the TTE.

As an example, take the independent variable number of features, (# features).  This indicates how many features on the phone each subscriber has.  To interpret the coefficient: e^B -1 * mean.  That is, for segment 1, (((e^-0.055) – 1) * 94.1) = -5.04.  This means, for every additional feature a subscriber in segment 1 has, their time until churn goes down (decreases) by 5.04 days.  Or from 94.1 to 89.06.

For segment 3, (((e^0.057) – 1) * 388.2) = 22.77.  This means, for every additional feature a subscriber in segment 3 has, their time until churn goes out (increases) by 22.77 days.  Or from 388.2 to 410.97.

The insights from this one product variable might indicate segment 1 is sensitive to price and things that cause price (their bill) to increase.  As price tends to increases these subscribers tend to churn.  Note for example this segment has the smallest income and least education.  Likewise segment 3 seems to be brand loyal.  As they get more features they tend to stay a subscriber longer, but barely by adding only 0.39 days.  Obviously an ROPI can be calculated using these insights.



Tenure 0.022 0.035 0.057 Tenure 2.09 9.80 22.77
# features -0.055 -0.033 0.001 # features -5.04 -8.93 0.39
# phones -0.066 -0.002 0.002 # phones -6.01 -0.55 0.78
# IVR minutes 0.250 0.005   # IVR minutes 26.73 1.38  
# drop calls -0.002 -0.720 -0.001 # drop calls -0.19 -141.19 -0.39
% billing plan 0.290 0.006 0.004 % billing plan 31.66 1.66 1.56
Total minutes -0.300 -0.020 0.050 Total minutes -24.39 -5.45 19.90
Total data -0.220 -0.050 0.033 Total data -18.58 -13.42 13.02
# Pmts CC 0.010 0.002   # Pmts CC 0.95 0.55
# Pmts chk 0.005 0.003 # Pmts chk 1.38 1.17
# emails sent 0.035 0.002 -0.004 # emails sent 3.35 0.55 -1.55
# SMS sent 0.008 0.005   # SMS sent 0.76 1.38
avg discount 0.312 0.020 0.001 avg discount 34.46 5.56 0.39
HH income 0.010 -0.030 0.020 HH income 0.95 -8.13 7.84
Size HH -0.280 0.001 0.003 Size HH -22.98 0.33 1.17
Age 0.056   Age 5.42
Education 0.067 0.012 -0.001 Education 6.52 3.32 -0.39
AVG TT CHURN 94.1 275.1 388.2  

Lastly, take discounts.  Here a direct ROI can be calculated.  If the firm gives an X % discount to subscribers in a segment that will result in a Y increase in TT churn.  That increase in churn can be valued at the current bill rate.

For segment 1, which has an average discount rate of 12%, if that was increased to 13% the TT churn goes out by 34.46 days.  Clearly this segment is very sensitive to price.  These would not be known unless a segmentation and churn model was implemented.  Conversely segment 3 is not sensitive to price and is very brand loyal.  If the discount went from 2% to 3% the TT churn would only go out by 0.39 days.  They will take the discount but it is nearly irrelevant.

Note also that segment 2 seems sensitive to dropped calls.  Segment 1 and segment 3 seem to not be sensitive to dropped calls.  This knowledge allows a strategy specifically aimed at segment 2.



The point of all the above was to demonstrate how segmentation drives more insights for strategy and more accuracy for modeling and better action-ability.  What if only one model were developed overall, instead of by-segment?

Let’s look at the variable # features.  One model will have a coefficient of -0.04.  Note it is negative, on average, which means as number of features increases the time until churn decreases, come in.  Strategically this would indicate not upselling to subscribers more features, because time until churn is shorter.  Of course this is the wrong decision for segment 3, more features and they are happier and more loyal and the best customers stay on the database longer and keep paying.  That is, doing one model for the whole database would give the wrong indication to those that drive 63% of the revenue.

Another simple example, number of emails sent.  Same argument, for segments 1 and 2 as more emails are sent the TT churn goes out.  But for sent 3, more emails cause emails fatigue and the TT churn comes in.  This is an important strategic insight: do NOT send more emails to the very loyal segment.  They do not need they, they are an irritation and tend to cause churn.  Again, this insight would not be found without doing segmentation first.



Segmentation should be seen as a strategic process, not an analytic one.  Segmentation has uses other than merely to separate the market into homogeneous within and heterogeneous between classifications.  Segmentation can also be used to make predictive modeling more accurate and achieve more actionable strategic insights.  And it’s fun!


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>