All posts by mikegrigsby

I've been involved in marketing science for over 25 years. It’s the only job I've ever had. Sad but true. I guess I’m a mile deep and an inch wide. I was a statistical analyst at a water utility before moving to Sprint where I forecasted calls. At Dell my job was demand analysis, where I learned elasticity and had my first taste of direct / database marketing. I went to Millward Brown, as marketing research director. There were leadership stints at Hewlett-Packard in database marketing, online analytics at the Gap and marketing operations at Emerson. I segued into consulting at Rapp, as VP Analytics and settled at Targetbase, heading up the strategic retail analysis practice. I believe in marketing science as focused on understanding consumer behavior that is, driving demand. My PhD dissertation was about new ways of modeling demand. I've written articles for academic and trade journals. I've taught at both the graduate and under-graduate levels. I have spoken at trade conventions and seminars (National Conference of Database Marketing, Internet Retailing, Direct Marketing Association, American Statistical Association, etc.) I think I have a general understanding of the industry and great sympathy for those actually trying to do analytics. I have developed all this as a basis for promotion. The industry needs a gentle guidebook focused on helping analysts do marketing science: pull a targeted list, provide segmentation, test campaign effectiveness, forecast demand, etc. What are mostly available are dry, scholarly references that require wading through pages of mathematic formulas. Often by then the thread of the argument is lost and frustration results. I have felt this frustration myself, which is why I started this marketing science blog.

MODELING LOYALTY INTENSITY

(From chapters 17 and 18 of my second book ADVANCED CUSTOMER ANALYTICS, Kogan Page, 2017)

So, what is loyalty?  Should be easy to define, we all know what it is, right?  In the context of analytics, loyalty is when a consumer becomes a customer and likes the brand enough to come back again.  This customer likes the brand enough to continue coming back and even spread the word to their family and friends, even recommend it to their peers and network, even be an ambassador for the brand.  Note that at its base loyalty is about the customer, it is NOT about the firm or brand.  That is, loyalty analytics is (as it always should be) focused on the customer–what does the customer need, what does the customer like, what is the customer sensitive to, what will it take for the customer to become emotionally involved with the brand, what touchpoints are most important to a customer?  Often this means defining loyalty in terms of customer segments, especially how loyal a segment is, which needs or benefits does the brand satisfy for one segment over another, what is the range of loyalty–is it merely transactionally loyal or is a segment emotionally involved as an ambassador for the brand?

So the first issue is that loyalty should be designed as a win-win and viewed primarily from the customer’s POV, not the firm’s.  Note that most loyalty analytics, and even most loyalty books (even the pillar of loyalty books, Reichfeld’s The Loyalty Effect) is mostly about the firm.  That position tries to explain why loyalty helps a firm, how a firm should be interested in loyalty, what metrics should the firm track to gauge its customer’s loyalty, how understanding loyalty and increasing loyalty is a benefit to the firm.  This is short-sighted.  This approach will produce only a pareto effect achieved quickly and never increased.

While loyalty no doubt has an important value to the firm, the right framework is obsessing on the customer: their experience, their wants or needs, what is valuable to THEM.  This has everything to do with program design.  Why would a firm put a loyalty program in place?  If a firm is trying to collect members in order to send them emails about promotions and discounts, that is NOT a loyalty program, it is an email club.  That may have some value, especially if the firm’s products require a discount in order to buy, but that should not be called a loyalty program.  One thing to learn when understanding loyalty from a customer’s POV is that not all customers want the same thing, not all customers care about a discount.  (This is what elasticity modelling is all about.)  Some of them want something else!  Remember there are four Ps in tactical marketing and PRICE is only one of them.

 

IS THERE A RANGE OR SPECTRUM OF LOYALTY?

There is a range of loyalty from none to transactional (rational) to brand (emotional).  The point of loyalty analytics is to understand where on this spectrum a customer or segment is and learn how to incentivize and change their behavior to move up the scale.  If done aright, this is not only for the customer’s or segment’s benefit, it is of benefit to the firm.  Some customers or segments will not move, or that it costs too much to get them to move on the spectrum, and that is a valuable insight!

 

 

TWO DIMENSIONS OF LOYALTY

Note that there is actually no such thing as a blatant entity called or quantified as “loyalty”.  It is a latent variable.  The idea is that it is like intelligence, which is also unquantifiable as itself; it can only be indirectly measured as something like a score on an IQ test, which in turn measures dimensions of intelligence: spatial ability, logic, mathematics, verbal skills, etc.  Same is true for loyalty.  It can be seen and surmised by other actions.

So let’s use our behavioral segmentation based on customer transactions and responses to marcomm.  We are interested in how loyal each segment is, which is not necessarily the same thing as how much they spend or how many transactions they have.  So we do primary marketing research and ask questions about opinions and attitudes around price, value, quality and satisfaction.  These metrics will show a range of loyalty.  We also ask about share of voice, competitive density and the convenience of our stores compared to our competitors.

See the below loyalty framework.  It posits that there has been a behavior segmentation finished.  Different segments score differently on loyalty metrics. One segment is emotionally (brand) loyal and the other is transactionally loyal.

Let’s say we have survey data on segment responders including the below attitudes and metrics: PRICE, QUALITY, VALUE, SATISFACTION, SHARE OF VOICE, COMPETITIVE DENSITY and CONVENIENCE.  Using SEM, these variables will score on the loyalty spectrum, from zero loyalty to transitional loyalty up to emotional loyalty.  Thus we can ascertain how loyalty and with what dimension each segment is.

The model above tries to put a framework together that says consumer behavior (transactions, responses, etc.) is caused by a spectrum of loyalty (from none to transactional to emotional) which are in turn caused by attitudes around price, value, satisfaction and quality as well as opinions / metrics of operational logistics like convenience, share of voice and competitive density.

So the general analytic idea is that there are no such metrics or quantities as emotional or transactional loyalty.  These are latent variables.  But adding these variables helps explain the behavior of customers purchasing and customers responding.  This latent variable is discovered by a factor analysis-type technique used in SEM.  That is, the manifest variables indirectly show the influence of the latent variable and that latent variable is “teased out” and labeled.

(A quick note about the difference between transactional and emotional loyalty should clarify this important point.  It is possible for a customer to appear very loyal in terms of buying a lot of products, having a short time between purchases, responding to marcomm, etc., but not be in fact actually very loyal.  These are heavy purchasers because there might not be any competitors around, or our stores are very convenient or our share of voice is comparatively large.  Thus it’s important to know how “loyal” customers are, independent of other dimensions.  That is, a transactionally loyal customers may jump ship if competitors move in near their location, or change their share of voice.)

The results below are from applying the loyalty model to two different segments, say X and Y.  The segments were defined by (transactions and marcomm response) behavior.  The question is how loyal (what kind of loyalty) they are and what can be done about it.  Let’s say that each segment has generally the same metrics on transactions and responses.  Segment X scores as a transactionally loyal customer.  Note the parameter estimates of convenience and competitive density are very high and significant while share of voice is strong and negative.  These are traditional indications of the transactionally loyal segment.  Note also high and positive impacts of attitudes around price and quality.  And recognize that most of the variables on the emotional path are insignificant.

 

SEGMENT X, TRANSACTIONAL LOYALTY

Path variable parm est st error t value
transactional
price 5.65 3.23 1.75
quality 6.21 1.65 3.75
value 3.03 2.07 1.47
satisfaction 1.35 0.66 2.05
convenience 5.22 0.75 6.96
competition 2.66 0.99 2.68
share of voice -1.55 1.03 -1.51
Path variable parm est st error t value
emotional
price 0.03 2.66 0.01
quality 0.56 1.07 0.53
value 1.04 2.36 0.44
satisfaction 1.66 1.03 1.62
convenience 1.99 1.66 1.2
competition 0.66 2.04 0.32
share of voice 2.55 1.69 1.51

 

 

Now, a segment that scores as a strong transactionally loyal only segment is something of a red flag.  This is especially true if they LOOK like they are loyal based on their number and amount of purchases.

How can we use the above model to move the segment from mere transactionally loyal to emotionally loyal?  The answer is in the emotional loyal path.  The single largest impact is share of voice and that is a metric we can (somewhat) control.  There is a business case around what is the cost to spend and increase our relative share of voice applied against the added security (and perhaps increased purchasing) of a segment that evolves into emotionally loyal.  See that share of voice is negative in the transactional path?  As SOV increases this segment is less transactionally and more emotionally loyal.

Now let’s look at the opposite kind of loyalty, the brand or emotional kind.  These are customers that love our brand, no matter what.  View the output below for segment Y, which scores mostly as an emotionally loyal group.  Note on the emotional path convenience and competitive density are negative.  This segment is so connected to the brand that even if it is inconvenient to go to our store they go anyway and even if more competition moves in these customers come to our store anyway.  This is emotional loyalty.  You see also that on the emotional path, while price is positive it’s insignificant and quality is very small.  It should be no surprise that both value and satisfaction are high.  On the transactional path none of those metrics are significant.

 

SEGMENT Y, EMOTIONAL LOYALTY

Path variable parm est st error t value
transactional
price -1.27 5.65 -0.22
quality 2.07 6.24 0.33
value 2.07 1.65 1.25
satisfaction 0.03 5.07 0.01
convenience 0.23 0.2 1.17
competition 0.04 0.02 1.8
share of voice -2.65 1.54 -1.72
Path variable parm est st error t value
Emotional
price 3.25 3.04 1.07
quality 0.24 0.12 2.06
value 1.26 0.76 1.67
satisfaction 3.23 1.23 2.63
convenience -3.65 1.26 -2.91
competition -2.07 0.56 -3.66
share of voice 1.27 0.87 1.45

 

 

This is the power of SEM, hypothesizing and testing a latent variable.  This latent variable accounts for movement in the customer transactions and customer responses.  If only a blatant or manifest model was used the fit would not have been so well and the insights (differentiating between the two kinds of loyalty) would not be realized.  So is that cool, or what?

Structural Equation Modelling (SEM) is a powerful systems method especially in dealing with latent variables.  This has great importance into subjects like satisfaction in terms of loyalty and quantifying various degrees of loyalty.

 

Why Go Beyond RFM?

ANALYTIC SOLUTION: Explain advantages and disadvantages of RFM

(This chapter was published in a different format in Marketing Insights, April 2014)

INTRODUCTION

While RFM (Recency, Frequency and Monetary) is used by many firms, it in fact has limited marketing usage.  It is really only about engagement.  It is valuable for a short term, financial orientation but as organizations grow and become more complex a more sophisticated analytic technique is needed.  RFM requires no marketing strategy and as firms increase complexity there needs to be an increase in strategic planning.  Segmentation is the right tool for both.

RFM has been a pillar of database marketing for 75 years.  It can easily identify your “best” customers.  It works.  So why go beyond RFM?  To answer that, let’s make sure we all know what we’re talking about.

WHAT IS RFM?

One definition could be, “An essential tool for identifying an organization’s best customers is the Recency / Frequency / Monetary formula.” RFM came about more than 75 years ago for direct marketers.  It was especially popular when database marketing pioneers (Stan Rapp, Tom Collins, David Shepherd, Arthur Hughes, etc.) started writing their books and advocating database marketing (as the next generation of direct marketing) nearly 50 years ago.  It became a popular way to make a database build (an expensive project) return a profit.  Thus, the most pressing need was to satisfy finance.

Jackson and Wang wrote, “In order to identify your best customers, you need to be able to look at customer data using Recency, Frequency and Monetary analysis (RFM)…”  Again the focus is on identifying your best customers.  But, it is not marketing’s job to just identify your “best” customers.  “Best” is a continuum and should be based on far more than merely past financial metrics.

The usual way RFM is put into place, although there are an infinite number of permutations, ends up incorporating three scores.  See figure below.  First, sort the database in terms of most recent transactions and score the top 20%, say, with a 5 and on down to the bottom 20% with a 1.  Then re-sort the database based on frequency, maybe with the number of transactions in a year.  Again, the top 20% get a 5 and the bottom 20% get a 1.  The last step is to re-sort the database on, say, sales dollar volume.  The top 20% get a 5 and the bottom 20% get a 1.  Now, sum the three columns (R + F + M) and each customer will have a total ranging from 15 to 3.  The highest scores are the “best” customers.

RFM EXAMPLE

CUST ID R F M TOTAL
999 3 2 1 6
1001 5 3 3 11
1003 4 4 2 10
1005 1 5 2 8
1007 1 4 1 6
1009 2 4 3 9
1010 3 4 4 11
1012 2 3 5 10
1014 3 1 5 9
1016 4 1 4 9
1017 5 2 3 10
1018 4 3 4 11
1020 4 4 3 11
1022 3 5 3 11
1024 2 4 2 8
1026 1 3 5 9

 

Note that this “best” is entirely from the firm’s point of view.  The focus is not about customer behavior, not about what the customer needs, why those with a high score are so involved or why those with a low score are not so engaged.  The point is to make a (financial) return on the database, not to understand customer behavior.  That is, the motivation is financial and not marketing.

RFM works, as a method of finding those most engaged.  It works to a certain extent, and that extent is selection and targeting.  RFM is simple and easy to use, easy to understand, easy to explain and easy to implement.  It requires no analytic expertise.  It doesn’t really even require marketers, only a database and a programmer.

Say you rescore the database every month, in anticipation of sending out the new catalog.  That means that every month each customer potentially changes RFM value tiers.  After every time period a new score is ran and a new migration emerges.  Note that you cannot learn why a customer changed their purchasing patterns, why they decreased their buying, why they made fewer purchases or why the time between purchases changed.  Much like the tip of an iceberg, only the blatant results are seen and RFM gives nothing in the way of understanding the underlying motivations that caused the resultant actions.  There can be no rationale as to customer behavior because the purpose of the algorithm used was not for understanding customer behavior.  RFM uses the three financial metrics and does not use an algorithm that differentiates customer behavior.

Because RFM cannot increase engagement (it only benefits from whatever level of involvement, brand loyalty, satisfaction, etc., you inherited at the time–with no idea WHY) it tends to make marketers passive.  There is no relationship building because there is no customer understanding.  That is, because RFM cannot provide a rationale as to what makes one value tier behave the way they do, marketing strategists cannot actively incentivize deeper engagement.

RFM is a good first step, but to make a great step requires something beyond RFM.  Marketers require behavioral segmentation in order to practice marketing.

WHAT IS BEHAVIORAL SEGMENTATION?

Behavioral segmentation (BS) quickly followed RFM, due to the frustrations that RFM produced good, but not great, results.  As with most things, complex analysis requires complex analytic tools and expertise.  BS was put into place to apply marketing concepts when using a database for marketing purposes.

In order to institute a marketing strategy, there needs to be a process.  Kotler recommended the four Ps of strategic strategy: Partition, Probe, Prioritize and Position.  Partitioning is the process of segmentation.

While it’s mathematically true that partitioning only requires a business rule (RFM is a business rule) to divide the market into sub-markets, behavioral segmentation is a specific analytic strategy.  It uses customer behavior to define the segments and it uses a statistical technique that maximally differentiates the segments.  James H. Meyers even says, “Many people believe that market segmentation is the key strategic concept in marketing today.”

BS is from the customer’s point of view, using customer transactions and marcom response data to specifically understand what’s important to customers.  It is based on the marketing concept of customer centricity.  BS works for all strategic marketing activities: selection targeting, optimal price discounting, channel preference / customer journey, product penetration / category management, etc.  BS allows a marketer to do more than mere targeting.

An important point might be made here.  Behaviors are caused by motivations, both primary and experiential.  Behaviors are purchases, visits, product usage and penetration, opens, clicks and marcom responses, etc.  These behaviors cause financial results, revenue, growth, life-time value, margin, etc.

Primary motivations would be unseen things like attitudes, tastes and preferences, lifestyle, value set on price, channel preferences, benefits, need arousal, etc.  There are experiential, secondary causes of behavior, typically based on some brand exposure.  These are not behaviors, but cause subsequent behaviors.  These secondary causes would be things like loyalty, engagement, satisfaction, courtesy, velocity, etc.  Note that RFM uses recency and frequency, which are metrics of engagement, which is a secondary cause.  RFM also uses monetary metrics which are resultant financial measures.  Thus RFM does not use behavioral data, but engagement and financial data.  These are very different than behavioral data used in BS.  One simple way to distinguish behavioral data from secondary data is that behaviors are nouns: purchases, responses, etc.  Note that secondary causes are adjectives: engagement metrics, loyal customers, recent transactions, frequently purchased, etc.

BS typically requires analytic expertise to implement.  Behavioral segmentation is a statistical output.  (See the sidebar.)

One critical difference between BS and RFM is that in a behavioral segmentation members typically do not change groups.  That is, the behavior that defines a segment evolves very slowly.  For example, if one person is sensitive to price, her defining behavior will not really change.  She is sensitive to price even after she has a baby, she is sensitive to price as she ages, or if she gets a puppy, or buys a new house.  Her products purchased might change, her interests in certain campaigns might change, but her defining behavior will not change.  This is one of the advantages of BS over RFM.  This is what drives your learning about the segments.  BS provides such insights that each segment generates a rationale, a story, as to why it’s unique enough to BE a segment.

While RFM uses only three dimensions, BS uses any and all behavioral dimensions that best differentiate the segments.  It typically requires far more then three variables to optimally distinguish a market.

Because marketing mix testing can be done on each segment (using product, price, promotion and place) the insights generated make for differentiated marketing strategies for each segment.  To test if RFM tiers drive behavior is probably inappropriate, because tier membership potentially changes every time period.  Much like studies that proclaim, “Women who smoke give birth to babies with low birth weight,” there is spurious correlation going on.  Just as another dimension (socio-economic, culture, etc.) might be the real (unseen) cause of the low birth weight and NOT necessarily (only) the smoking, so as there are other dimensions of (unseen) behavior using RFM to explain, say, campaign responses.  That is, the response is not caused by the RFM tier, but some other motivation.

In short, BS goes far beyond RFM.  The insights and resultant strategies are typically worth it.

WHAT DOES BEHAVIORAL SEGMENTATION PROVIDE THAT RFM DOES NOT?

As mentioned, BS delivers a cohort of segment members that are maximally differentiated from other segment members.  Because these members typically do not change segments, various marketing strategies can be leveled at each segment to maximize cross-sell, up-sell, ROI, margin, loyalty, satisfaction, etc.

BS identifies variables that optimally define each segment’s unique sensitivities.  E.g., one segment might be defined by channel preference, another by price sensitivity, another by differing product penetrations and another by a preferred marcom vehicle.  This knowledge, in and of itself, generates vast insights into segment motivations.  These insights allow for a differentiated positioning of each segment based on each segment’s key differentiators.  You get away from trying to incentivize customers out of the “bad” tiers and into the “good” tiers.  In BS, there are no good or bad tiers.  Your job is now to understand how to maximize each segment based on what drives each segment’s behavior, rather than focus on only migration.  Thus, BS gives you a test-and-learn plan.

Because of the insights provided, knowledge is gained of each segment’s prime pain points, which means that each segment can be treated with the right message, at the right time, with the right offer and at the right price.  This kind of positioning creates a “segment of one” in the customer’s mind.  This uniqueness differentiates the firm, perhaps even to the extent to move it away from heavy competition and toward monopolistic competition.  This means you approach a degree of market power that is, becoming a price maker.

Because BS provides such insights it tends to make marketer’s very active in understanding motivations.  This tends to generate very lucrative strategies for each segment.

CONCLUSION

What are the advantages of RFM?  It’s fast, simple and easy to use, explain and implement.  What are the disadvantages of behavioral segmentation?  It requires analytic expertise to generate, is more costly and takes longer to do.

BS uses behavioral variables and uses them for the purpose of understanding customer behavior and it uses a statistical algorithm to maximally differentiate each segment based on behavior (see sidebar).  As mentioned, the vast majority of marketers that evolve from RFM to BS say it’s worth it, and their margins agree.

 

 

 

A NOTE: THE ONLY WAY TO DO CHURN MODELING

 

Churn is an important concept in most industries and insights around WHEN it is likely to happen and WHY it is likely to happen is strategically lucrative.  Using predictive analytics is key.

First a definition: churn (attrition) is when a customer no longer buys the product.  This can be a subscriber product (telecomm, gym memberships) or non-subscriber product (specialty retail, casual dining, hospitality).  Providing a list of valued customers at risk BY TIME PERIOD is fundamental.  However, there is some confusion around modeling churn.

Churn with a hard date and estimating churn by usage.

A common approach is to do logistic regression by customer and the dependent variable is churn  and the independent variables are typically campaign responses and transactions, maybe demographics, lifestyle, etc.  Then each customer is scored with probability to churn.  This has few insights in that how likely a customer is to churn has no bearing on WHEN the churn will happen.  That is, a customer may be 90% at risk to churn but not for many months.  Note that overlooking time until churn diminishes action-ability.

The next most common (and wrong statistically) is to do logits by month.  That is, do 12 models and the dependent variable is churn in January or not, the next model is churn in February or not, etc.  This will give the probability to churn by month.  It is inappropriate statistically because all models require independence of each other.  That is, the February model is assumed not dependent on January—which is of course false.  The probability to churn in February is absolutely dependent on whether or not the customer churned in January.

This analytic approach solves the time problem, but introduces a worse problem, confusion of independence resulting in spurious correlation.  And BTW ordered logits is also inappropriate in that the sequence of proportionalities is potentially continuous—not to mention interpretation is extremely difficult.

Another approach, trying to incorporate time until the event, is to use ordinary regression and the dependent variable is a counter of time until churn.  This is problematic because a decision has to be made about those that did NOT churn.  What will be done with them?  Delete them?  Give them the shortest churn value?  Give them the longest churn value?  Depending on the percentage of those not churning, each of these three solutions is very poor.

The only appropriate technique in terms of time until an event, taking into account those that had as well as those that did not have the event, is survival modeling.  It was specifically designed for time until event problems (originally time until death).  It also, through partial likelihood techniques, accounts for those that have as well as those that did not have the event.  That means it solves both of the above problems.

So, using survival modeling on churn problems is suggestion number one.  Suggestion number two is to do segmentation by causes of churn and then do a specific (potentially different) model on each segment.

It is critical that the segmentation be about what CAUSES churn, not churn itself.  That is, do not use churn as a segmenting variable.  Churn is a result, not a cause.

Hypothesizing what causes churn is a good first step.  Using telecom as an example what can cause churn?  Dropped calls from a weak network can cause subscribes to change providers.  What else?  A high bill can cause churn.  The high bill can be because of the number of lines and the number of features a subscriber has and the result is a higher than expected bill so they churn.  Another cause of high bills might be high usage, from minutes or data, etc. and this higher than expected bill can cause churn.  Say there are these three segments for simplicity.  Note that each segment has a different cause of churn.

The idea now is to do a survival model for each segment.  There will be three survival models.  One will show dropped calls decreases the time until churn drastically in segment 1, but does not do much to impact segment 2.  This is because segment 2 is not sensitive to dropped calls, but sensitive to a higher bill.  The way to slow down churn in segment 2 is to offer say a discount and an average billing plan, etc.  Note that a discount to the dropped call segment will not likely be very effective.  This is why a causal segmentation is recommended: there are different actions by segment.  Note that if one survival model was applied to everyone, these individual actions would be weaker and diluted.

Note also that not only does this approach provide a time until churn estimate (a list of those at risk ranked by time) by segment, it provides a way to CHANGE that time until churn.  Looking again at segment 2, which is sensitive to high bills, one obvious action is to offer them a discount.  Given that survival modeling is a regression like equation, the amount of discount can be an independent variable.  This means there is a coefficient on how a discount effects the time until churn.  Say a 5% discount tends to push out the time until churn by 6 months.  Now there is an ROI.  It costs 5% and the additional 6 months the subscriber stays on the system is the return.  So an ROI can be calculated and a business case provided.

The above was meant as a simple note on a process that works.  Churn modeling is critical for many industries and deserves appropriate and actionable insights.

 

 

 

Using Segmentation to Improve Both Strategy and Predictive Modeling By Mike Grigsby

INTRODUCTION

We all want to improve the accuracy and insights generated from predictive modeling.  We all like to believe that consumer behavior is predictable.  (Ha!)  The following is a simple philosophy that advocates better predictive models and more actionable strategy comes from segmenting first

By separating consumer behavior into causes that generate strategic insights, better actions can be obtained.  Accuracy of predictive modeling will improve by doing a different model for each segment, rather than one model applied to the whole database.  Thus segmentation makes the models more accurate and generates better insights that cause smarter strategies for each segment.  See Figure 1 below.

FIGURE 1

SEGMENTATION IS A STRATEGIC, NOT AN ANALYTIC, PROCESS

First, be aware that segmentation is about strategy.  Analytics is a part (the most fun part!) of the process.  As mathematics is the handmaiden of science (so said Albert Einstein) so is analytics the handmaiden of strategy.  Analytics without strategy is like the sci-fi action adventure movie with no plot.  (We’ve all seen them!)  There may be explosions and shoot outs and car chases but without a story it has no meaning.

The four Ps of strategic marketing are:

Partition: this is segmentation.  Homogeneous within and heterogeneous between.

Probe: creating new variables, adding on third party overlay data or marketing research.  It fleshes out the segments.

Prioritize: this step uses financial valuations (lifetime value, contribution margin, ROI, etc.) to focus strategy.

Position: after the above, the four Ps of tactical marketing (product price promotion and place) are levied differently against each segment to extract the most value.  Each segment requires a different strategy (that is why they are segments).

Note that segmentation is the first of the four Ps.  The bottom line is that the more differentiated the segments are the more actionable the strategy can be.

 

WHICH ALGORITHM?

Those who have read my earlier works know I advocate latent class analysis as the state of the art in segmentation.  K-means is probably LCA’s closest competitor, although SVM is catching up, mostly because it is frtee using R or python, etc..  But, as stated, LCA offers superior performance.  This is for several reasons:

  • Latent class does not require the analyst to state the number of segments to find, unlike K-means. LCA tells the analyst the optimal number of segments.
  • Latent class does not require the analyst to dictate which variables are used for segment definition, again unlike K-means. LCA tells the analyst which variables are significant.

In short, unlike K-means, there are no arbitrary decisions the analyst needs to make.  The LCA process finds the optimal solution.

  • Latent class maximizes the probability of observing the scores seen on the segmenting variables, hypothesized to come from membership in a (latent) segment. That is, LCA is a probabilistic model.
  • K-means uses the square root of the Euclidean distance of each segmenting variable to define segment membership. K-means does not optimize anything; it is only a mathematical business rule.

 

WHY WOULD SEGMENTATION IMPROVE PREDICTIVE MODELING ACCURACY?

Segmentation will improve modeling accuracy because instead of one overall (on average) model there will be a different model for each segment.  The different granularities cause a smaller error.

It’s very possible (because it is a different model) to have different variables in each model.  The example below is meant to illustrate just that.  This also leads to the additional insights.  See figure 2.  The simple answer is that with one model the dependent variable is on average say 100, plus/minus 75.  But with three models (one for each segment) the dependent variable is 50 plus/minus 25, 100 plus/minus 25 and 150 plus/minus 25.  Of course accuracy will be much better.

FIGURE 2

 

SEGMENTING VARIABLES FOR MODEL IMPROVEMENT

For segmenting variables use causal, not resulting, variables.  That is, if you are doing a demand model where units are the dependent variable, the segmentation should be based on things that cause demand to move, NOT demand itself.  That is, use sensitivity to discounts, marcomm stimulation, seasonality, competitive pressure, etc.  Do not segment based on revenue or units, these are resulting variables, these are the things you are trying to impact.

After segmenting, elasticity can be calculated, market basket penetration can be ascertained and marketing communication valuation (even media mix modeling) can be done for each segment.  Imagine the insights!  Then a different demand model for each segment can be done.

 

EXAMPLE: CHURN MODELING

First a little background, both on churn modeling and survival analysis.  Churn (attrition) is a common metric in marketing analytics and there is usually a strategy about how to combat churn.  The analytic technique is called survival modeling.

Say we have data from a telecomm firm that wants to understand causes of churn and strategies to slow down churn.  The solution will be to first segment subscribers based on causes of churn and then to do a different survival model for each segment.  There should be a different strategy for each segment based on different sensitivities to each cause.

Survival modeling became popular in the early 1970s based on proportional hazards, called Cox regression, and in SAS is proc PHREG.  That is a non-parametric approach but the dependent variable is the hazard rate and is difficult to interpret and very difficult to explain to a client.  Most marketing analysts use a parametric approach (in SAS proc LIFEREG).  Lifereg has the dependent variable ln(time until the event) where the event is churn.

Survival modeling came out of biostatistics and has become very powerful now in marketing.  Survival modeling is a technique specifically designed to study time until event problems.  In marketing this often means time until churn, but can also be time until response, time until purchase, etc.

The power of survival modeling comes from two sources: 1) a prediction of time until churn can be calculated for each subscriber and 2) because it is a regression equation there are independent variables that will tell how to increase / decrease time until churn for every subscriber.  This will develop personalized strategies for each subscriber.

So, TABLE 1 shows a simple segmentation, with three segments.  The mean values are shown (as KPIs) for each segment as a general profile.  The segmenting variables were discount amount, things that impact price (data, minutes, features, phones, etc.), IVR, dropped calls, income, size of household, etc.  Note that percent of churn was NOT a segmenting variable.

 

Segment 1

The largest segment at 48% subscribers but only brings in 7% of the revenue.  They are either an opportunity or should be DEmarketed.  This segment has the shortest tenure, fewest features, most on a billing plan, fewest minutes, etc., pays mostly by check and NOT credit card, is not marketed to but responds the most.  They seem to be looking for a deal.  They use the most discounts (when the get them) lowest household income, are the youngest with the least education.  They are probably sensitive to price and that causes them to churn, which they do more than the other segments at 44% after only 94.1 days.

 

TABLE 1

KPIs SEGMENT1 SEGMENT2 SEGMENT3
   
% Subscribers 48% 29% 22%
% Revenue 7% 30% 63%
Average Bill $86 $188 $282
Tenure (Days) 145 298 401
# Features 0.8 2.1 4.9
# Phones 2.1 2.3 1.4
# IVR Minutes 88 14 6
# Dropped Calls 2.1 4.5 1.9
% Billing Plan 74% 49% 22%
Total Minutes 98 168 244
Total Data 145 225 354
# Pmts CC 11.9 10.7 1.1
# Pmts Chk 2.3 4.4 9.9
# Emails Sent 2.2 4.6 10.9
# SMS Sent 0.5 1.1 8.9
% Response 27% 19% 11%
Avg Discount 12% 8% 2%
HH Income $39,877 $74,555 $188,787
Size HH 3.1 2.8 1.8
Avg Age 29 35 44
Education 11 13 17
% churn 44% 39% 21%
avg TT Churn 94.1 275.1 388.2

 

Segment 2

29% of the subscribers bring in 30% of the revenue, so they do pretty much their fair share.  They have the most phones and the largest households but get the most dropped calls. 39% of the subscribers churn on average 9 months after subscribing.

 

Segment 3

The smallest segment, at 22% brings in a whopping 63% of the revenue.  They are loyal and satisfied, buy the most features and keep coming back.  They do not have a lot of phones because they have the smallest households.  They basically do not use IVR and only 22% are on a billing plan. They have the highest education and household income and are mostly middle-aged.  They do not use much discount and pretty much ignore marcomm, even though they are sent the majority of communications.  It takes them over a year to attrite and only 21% do.

 

INTERPRETATION AND INSIGHTS

The insights come from the model output that drives the strategies.  See Table 2.  This shows on the left side the coefficients resulting from a churn model for each segment.  If the cell is blank it is because that variable for that segment model was insignificant.

Interpretation of survival coefficients is relatively straightforward.  The dependent variable is ln(TTE), the natural log of time until the event and in this case the event is attrition.  So the coefficients tell both direction and strength of TTE.  If the coefficient is positive an increase in that variable will push OUT the TTE, if the coefficient is negative an increase in that variable will pull IN the TTE.

As an example, take the independent variable number of features, (# features).  This indicates how many features on the phone each subscriber has.  To interpret the coefficient: e^B -1 * mean.  That is, for segment 1, (((e^-0.055) – 1) * 94.1) = -5.04.  This means, for every additional feature a subscriber in segment 1 has, their time until churn goes down (decreases) by 5.04 days.  Or from 94.1 to 89.06.

For segment 3, (((e^0.057) – 1) * 388.2) = 22.77.  This means, for every additional feature a subscriber in segment 3 has, their time until churn goes out (increases) by 22.77 days.  Or from 388.2 to 410.97.

The insights from this one product variable might indicate segment 1 is sensitive to price and things that cause price (their bill) to increase.  As price tends to increases these subscribers tend to churn.  Note for example this segment has the smallest income and least education.  Likewise segment 3 seems to be brand loyal.  As they get more features they tend to stay a subscriber longer, but barely by adding only 0.39 days.  Obviously an ROPI can be calculated using these insights.

 

TABLE 2

COEFF SEGMENT1 SEGMENT2 SEGMENT3 E^B -1 * mean SEGMENT1 SEGMENT2 SEGMENT3
       
Tenure 0.022 0.035 0.057 Tenure 2.09 9.80 22.77
# features -0.055 -0.033 0.001 # features -5.04 -8.93 0.39
# phones -0.066 -0.002 0.002 # phones -6.01 -0.55 0.78
# IVR minutes 0.250 0.005   # IVR minutes 26.73 1.38  
# drop calls -0.002 -0.720 -0.001 # drop calls -0.19 -141.19 -0.39
% billing plan 0.290 0.006 0.004 % billing plan 31.66 1.66 1.56
Total minutes -0.300 -0.020 0.050 Total minutes -24.39 -5.45 19.90
Total data -0.220 -0.050 0.033 Total data -18.58 -13.42 13.02
# Pmts CC 0.010 0.002   # Pmts CC 0.95 0.55
# Pmts chk 0.005 0.003 # Pmts chk 1.38 1.17
# emails sent 0.035 0.002 -0.004 # emails sent 3.35 0.55 -1.55
# SMS sent 0.008 0.005   # SMS sent 0.76 1.38
avg discount 0.312 0.020 0.001 avg discount 34.46 5.56 0.39
           
HH income 0.010 -0.030 0.020 HH income 0.95 -8.13 7.84
Size HH -0.280 0.001 0.003 Size HH -22.98 0.33 1.17
Age 0.056   Age 5.42
Education 0.067 0.012 -0.001 Education 6.52 3.32 -0.39
AVG TT CHURN 94.1 275.1 388.2  
 
     
           
   
   
   
   
   
   

Lastly, take discounts.  Here a direct ROI can be calculated.  If the firm gives an X % discount to subscribers in a segment that will result in a Y increase in TT churn.  That increase in churn can be valued at the current bill rate.

For segment 1, which has an average discount rate of 12%, if that was increased to 13% the TT churn goes out by 34.46 days.  Clearly this segment is very sensitive to price.  These would not be known unless a segmentation and churn model was implemented.  Conversely segment 3 is not sensitive to price and is very brand loyal.  If the discount went from 2% to 3% the TT churn would only go out by 0.39 days.  They will take the discount but it is nearly irrelevant.

Note also that segment 2 seems sensitive to dropped calls.  Segment 1 and segment 3 seem to not be sensitive to dropped calls.  This knowledge allows a strategy specifically aimed at segment 2.

 

WHAT IF THERE WAS NO SEGMENTATION?

The point of all the above was to demonstrate how segmentation drives more insights for strategy and more accuracy for modeling and better action-ability.  What if only one model were developed overall, instead of by-segment?

Let’s look at the variable # features.  One model will have a coefficient of -0.04.  Note it is negative, on average, which means as number of features increases the time until churn decreases, come in.  Strategically this would indicate not upselling to subscribers more features, because time until churn is shorter.  Of course this is the wrong decision for segment 3, more features and they are happier and more loyal and the best customers stay on the database longer and keep paying.  That is, doing one model for the whole database would give the wrong indication to those that drive 63% of the revenue.

Another simple example, number of emails sent.  Same argument, for segments 1 and 2 as more emails are sent the TT churn goes out.  But for sent 3, more emails cause emails fatigue and the TT churn comes in.  This is an important strategic insight: do NOT send more emails to the very loyal segment.  They do not need they, they are an irritation and tend to cause churn.  Again, this insight would not be found without doing segmentation first.

 

CONCLUSION

Segmentation should be seen as a strategic process, not an analytic one.  Segmentation has uses other than merely to separate the market into homogeneous within and heterogeneous between classifications.  Segmentation can also be used to make predictive modeling more accurate and achieve more actionable strategic insights.  And it’s fun!

 

CATEGORY MANAGEMENT FOR RETAIL

ABSTRACT

Category management comes from CPG industries.  This is a strategy used to assign a role to a major product category.  The roles are destination, occasion, and convenience and routine.  These roles are assigned based on calculations along two dimensions: percent purchasing the category and number of purchases.

CPG assumes their entire customer base assigns the same role to each category.  A better strategy for retail would be delivery of a behavioral segmentation and calculate each role BY SEGMENT.  Thus segment X might assign the role of say destination to one category but segment Y assigns the role of occasion, based in the landing in the 2×2 grid.  This provides better targeting and creation of more compelling message, in that not one size fits all.

 

Category management (as a strategy) comes from CPG industries.  CPG defines four “roles” that customers give to product categories.  A category is a distinct, managed group of products that customers perceive to be interrelated or substituted for their needs.  These roles are not driven by finance, advertising or supply-side logistics but by customer behavior.

The roles assigned to product categories (using groceries as an example) are:

DESTINATION: Key staples like milk, bread and meat.  It is WHY shoppers visit.  A large percent of customers buy these products and they buy a large number of them.

OCCASION: Important to the shopper, mostly seasonal or based on occasion / season, e.g. birthday, anniversary, Christmas.

CONVENIENCE: Purchased infrequently, but important when a customer buys them. In a grocery store these are hardware items, shoe polish, etc.

ROUTINE: These tend to be items like pet products, paper towels, toilet tissue, etc.  A small percent of consumers purchase these but they buy a large number of them.

In groceries, a role is assigned to a product and assumed that all shoppers give that same role to the product.  In retail that is not the case.  This is important.  Grocery stores assume all customers assign say milk as a destination role and come to the store specifically for that.  Retail assumes that different segments may give different roles to the same product.  That is, one segment may indeed go to the grocery store specifically to buy milk but another buys milk only for say cooking and therefore assigns it a routine role.

Analytics can show that different segments assign a different roles to the SAME category.  Again, say segment X assigns the role of “destination” to kid’s clothes (it is the reason they come to the store) but segment Y assigns the role of “seasonal” to kid’s clothes.  No marketing strategist would message kid’s clothes the same way to each segment and that is the point.

Now, how do we determine (calculate) the assignment of these four roles by segment?  There are usually two metrics that distinguish a simple lattice squares, a 2×2 matrix.  The percent of the segment purchasing the categorizing is on the vertical axis and the number of items (of the category) purchased is on the horizontal axis.  See figure 1 below.  The four quadrants of these metrics, when comparing one segment to another, tends to differentiate and assign the roles described above.  (In practice, these metrics are indexed by segment and plotted on the above metrics as axis.)

 

 

FIGURE 1 CATEGORY ROLES

 

% segment OCCASIONAL DESTINATION
purchasing CONVEINENCE ROUTINE
# of items purchased

 

Using this descriptive framework, each segment can be plotted in terms of product categories.  That is, where a grocery store assumes all customers treat say steak as a staple (a destination role) there may be a segment that treats steak as an occasional (even seasonal) role.  This means that after segmentation has been performed each segment can be plotted and different roles can be assigned.

As an example, see figure 2 below showing MEN’S CLOTHING & ACCESSORIES.  The metrics are “percent of segment purchasing” and “average number of category units purchased by segment”.  These are then indexed to the mean and plotted on the graph (figure 3) after that.  Note that segment 1 plots as occasionally (even seasonally) buying MEN’S CLOTHING & ACCESSORIES whereas segment 2 calculates as assigning a destination role to MEN’S CLOTHING & ACCESSORIES.  No marketing manager would send the same messages or offers to these two segments in terms of this category.

 

FIGURE 2 CATEGORY MANAGEMENT METRICS

MEN’S CLOTH / ACCESS % PURCH # PURCH
SEGMENT1 0.65 1.10
SEGMENT2 2.18 2.41
SEGMENT3 0.03 0.05
SEGMENT4 1.88 1.54
SEGMENT5 0.26 0.27

 

 

 

CONCLUSION

Category management came from CPG industries.  This is an approach that each category (major product) is defined in terms of four different roles assigned by customers.  These roles are destination, occasion, convenience and routine.  The classification of these roles depend on customers scoring along two axis: percent purchasing and number of purchases.

Take this approach one step further and do category management by segment.  That is, one segment may treat a product category as say a destination but another will treat that same product as convenience.  This differentiation means that messages and promotions and bundling offers can be versioned by segment.

 

 

MODELING SAME STORE SALES USING SIMULTANEOUS EQUATIONS

published in DMA MARKETING ANALYTICS JOURNAL June 2016

By Mike Grigsby, PhD

* SORRY THE GRAPHIC FIGURES DID NOT COME THROUGH. *

ABSTRACT

A specialty retailer wanted to develop a model to ascertain revenue performance by stores. They wanted to differentiate first time buyers from repeat buyers, in order to exploit those different sensitivities. They believed there were regional differences and needed the model to account for those differences.

Some of their stores were in attractive areas, having little competition and / or good demographics (income, lifestyle, etc.) whereas other stores were in less attractive areas. The question was around these uncontrollable dimensions vis a vis controllable ones like pricing, staffing, store appearance, customer service, satisfaction, etc. That is, they wanted to develop a “scorecard” for each store taking into account their controllable operations given their uncontrollable circumstances.

Because this framework clearly had at least one independent variable to be used as a dependent variable (satisfaction) and because there was staging involved (customer service => satisfaction => sales), simultaneous equations was the econometric technique of choice.

The resulting analysis allowed development of a scorecard for each store.     This meant that for one store a particular variable, say, net price could be very powerful in terms of driving sales, but for another store (perhaps in another region) net price was less impactful. It also meant that two stores with similar uncontrollable situations that varied in their same store sales could be analyzed in terms of better operations, etc.

BUSINESS PROBLEM

A specialty retailer had about 450 stores nationwide. They wanted to develop a model to ascertain revenue performance. What explained same store sales? The goal of this was to both predict sales and to account for sales. That is, assess accountability for store managers in terms of performance.

It was hypothesized there were two general classes of performance drivers, some in the store’s control and some not in the store’s control. Examples of variables not in the store’s control include number of competitors, demographics around each store’s trade area, etc. Examples of variables within the store’s control (and hence things they could do to increase their performance) included net price, marketing spend, staffing (both the number and type), customer service training, culture, employee engagement, etc.

These differences in performance likely varied by, say, region. Some of the regions may have a large employer move in (or out), differences in unemployment, income, household size, etc., and might make a difference in how effective a store’s operations were. That is, senior management wanted to know if a store is performing aright, taking into account their regional circumstances. It may be possible for a store to do no better than it did, maximizing their pricing and marketing and staffing, and it might be possible for a store to do far better, given their very attractive circumstances.

They wanted to differentiate first time buyers from repeat buyers, in order to exploit and target those different behaviors. First time buyers may be motivated by lead generation, cooperative partnerships, a social media reputation score, whereas none of these would have much of a bearing on repeat buyers.

DATA COLLECTED

This business objective required data from three major sources. First the transaction database would supply same store sales, net price, etc. The second source was primary marketing research in terms of employee engagement, satisfaction, loyalty, customer service and store culture. The last source was overlay data to detail number of competitors, demographics, interests and lifestyle.

The transactional database supplied same store revenue, units, average net price and number and type of staffing. There were also data including certification of industry excellence standards, external and internal store appearance, distance each customer was from each store, etc.

There was a heavy investment in marketing research, primarily focusing on these areas: satisfaction, customer service, employee engagement, quality of assortment and store culture. These responses came from the database of customers so were easy to merge together.

Lastly, several overlay data sources were used. One gave demographics (income, age, size of household), another gave interests, lifestyle and a third gave number of competitors in trade area, etc.

MODELING FRAMEWORK

The stores were grouped by geography, typical in retail. (Another possibility–often preferred, depending on operational tactics—would be to do a behavioral segmentation and then do simultaneous equations by segment.) Each Group VP had from 30 – 70 stores to manage. Most of their annual bonuses are based on same store sales so understanding drivers to increase unit sales is critical.

There would have to be a separate model for first time purchases as differentiated from repeat purchasers. (About 30% of total sales are from first time purchasers.)

Likewise, because of the differences by region, there would have to be a different model for each region. This would amplify the key insight: what kinds of sales performance can be expected given differences by region? Obviously a national KPI could not be standardized across all regions, it needed to be distinct at least at the region level.

Simultaneous Equations

The dependent variable for first time and repeat customers would be units, typical in retail. It was hypothesized there would be some variables unique to first units and some variables unique to repeat units and some variables shared by both. (This was one of the reasons a systems approach was needed.) Causality also suggested a staged approached in that satisfaction was caused by some variables and repeat units were caused by satisfaction. See figure 1 for a graphic representation.

The below are the simplified hypothesized equations.

First Units = f(net price, # competitors, store appearance, marketing

spend, age, income, lifestyle, partnerships, reputation, lead generation, seasonality)

Repeat Units = f(net price, # competitors, store appearance, marketing

spend, age, income, lifestyle, SATISFACTION, customer service, staffing, employee engagement, seasonality)

Satisfaction = f(net price, customer service,   store culture, employee

Engagement, staffing)

 

FIGURE 1 CONCEPTUAL MODELING FRAMEWORK

 

 

The above meant that three stage least squares (3SLS) was one of the key econometric techniques of choice.

(A quick note about another popular (simultaneous equation) choice, Vector Auto Regression (VAR), is that because different independent variables are in each of the equations a vector would be inappropriate. That is, the ability to have different variables by equation (rather than a vector) is more accurate and more insightful.)

Thus, in this case, 3SLS is preferred and instrumental variables had to be found. These would have to be correlated with the endogenous variables and uncorrelated with the error terms. Often large scale macro variables (consumer confidence, industry growth, etc.) can be used as they are correlated with many dependent / endogenous variables (units, revenue, etc.) and (hopefully) less correlated with error terms.

The endogenous variables are those estimated by the system of equations, in this case the dependent variables and those shared by all equations, e.g., first units, repeat units, satisfaction, net price. The exogenous variables are those given and thus outside the system, in this case marketing spend, store appearance, demographics, etc.

In order to be solved, each equation must be at least identified. That is, the number of exogenous variables excluded from each equation has to be greater than the number of endogenous variables included, less one.

As typical, in terms of generating the model results, the data file was split into two random samples. The model was estimated using the “training” sample and verified using the “testing” sample. There was no attempt to “simulate” via some Monte Carlo, etc., process. That is, the point of the model was not to asses risk or range of outputs.

The cost of doing simultaneous equations is that the only desirable property remaining for estimators is consistency. Because variables depend on values from other equations, they cannot be assumed to be fixed. (That is, the assumption of non-stochastic X is violated.) The benefit is that simultaneous equations more accurately model the behavior sought to be understood. Added complexity means added insights.

RESULTS

 

The model showed differences between first time and repeat units and satisfaction by region. As hypothesized, different regions are sensitive to different independent variables.

 

That is, repeat visitors have different sensitivities varying by region. In one region net price may dominate and in another region staffing may dominate and in yet another region satisfaction may dominate. Likewise, first time visitors have different sensitivities varying by region. In one region partnerships may dominate and in another region lead generation may dominate and in yet another region their online reputation score may dominate. All of the above are controllable (within the firm’s ability to change) variables.

 

The model showed differences between controllable and uncontrollable variables by region. In one region unemployment may dominate and in another region the number of competitors may dominate and in yet another region demographics (income, size of household, education, etc.) may dominate.

 

 

Model Output

 

To show the power of this kind of analysis, two regions are detailed below. These are the (final) results of the 3SLS model applied by each region. The key thing to notice is that different independent variables (controllable as well as uncontrollable) are significant by different regions. This is as expected. Note also a different elasticity, even for the same variables, is different by regions.

 

In table 1, first timers are very sensitive to net price, in that a 10% decrease in net price causes a 22.5% decrease in units. The number of sales associates is significant in this region and a 10% increase in sales associates causes a 6.5% increase in units, so while it’s impactful it would be classified as insensitive. Each of these variables gives lucrative strategic insights and provides a business case. The cost of changing price and the cost of hiring more associates can be weighed again the benefits of additional units (and ultimately addition revenue). That is, this analysis pinpoints not only which “levers” a regional VP can pull but by how much in order to maximize total revenue.

 

Their reputation score (a calculation similar to Net Promoter Score) is (barely) elastic and obviously as the firm closes more leads this drives more units. There could be a business case made here as well, in that perhaps hiring more call center reps could increase more closed leads.

 

The number of competitors has a negative impact on first time units and this is an uncontrollable variable. The value in this is that it provides quantification to number of new units as more (or less) competitors move into the trade area.

 

TABLE 1

 

REGION X
ELAST
ESTIMATE MEAN
FIRST UNITS 3.77
Controllable
net price -0.78 10.87 -2.25
# of sales assoc 0.22 11.22 0.65
reputation score 0.06 77.66 1.13
# leads closed 0.45 27.11 3.24
Uncontrollable 0.00
# competitors -0.45 24.77 -2.92
ESTIMATE MEAN
REPEAT UNITS 44.66
Controllable
net price -3.55 18.55 -1.47
# of emails sent -0.91 112.55 -2.30
# of direct mails sent 4.55 14.22 1.45
# sales assoc 5.09 11.22 1.28
internal appear 11.24 0.11 0.03
Uncontrollable
med income 0.001 $61,244 1.37
# competitors -0.60 24.77 -0.33
satisfaction 5.78 7.80 1.01
ESTIMATE MEAN
SATISFACTION 7.80
Controllable
customer service 1.55 6.07 1.21
systems 0.87 5.22 0.58
product assort 1.55 8.97 1.78
Uncontrollable
# competitors -0.25 24.77 -0.79
distance from store -0.08 9.87 -0.10

 

 

 

Repeat visitors are also sensitive to net price. If a 10% decrease were applied there would be a corresponding increase in units by 14.7% and a resulting increase in net revenue. Note that the number of emails sent is negative in terms of responding units. This is rationalized as email fatigue. The number of direct mail sent is positive and impactful. Hiring more associates (which will drive both first and repeat units) in the repeat model is more impactful (as expected) than the first time model. Here, increasing 10% more associates drives 12.8% more units and resulting more total revenue. And the internal appearance of the store is important and positive but rather minor impact. Thus this region has several ways to affect their performance, that is, there are a few variables directly in their control to impact units.

 

Number of competitors (as with first time units) is also impactful and negative. Distance from store is negative but these are uncontrollable. That is, these should be watched but cannot really be acted upon.

 

Lastly it is important to understand the impact of satisfaction. This is why simultaneous equations was an appropriate choice. As satisfaction increases by 10% repeat units increase by 10.1%. The question is, how will the regions increase satisfaction? The answer lies in the below satisfaction model.

 

There are several things that compose satisfaction, and these may differ somewhat by region. Customer service and systems improvement and product assortment all have a positive impact on satisfaction. Product assortment is greatest and a system is actually insensitive to an impact on satisfaction. But again the issue is in the ability to calculate a business case. What is the cost of increasing customer service, or improving systems or (if even possible) explaining product assortment? Whatever the cost, the return it gives generates an ROI. This model details a way to optimizing which projects best improve satisfaction, which will in turn improve repeat units which will drive repeat net revenue.

 

As expected number of competitors and distance from store decreases satisfaction. Both of these have a minor impact on satisfaction.

 

 

TABLE 2

 

 

 

REGION Y
ELAST
ESTIMATE MEAN
FIRST UNITS 4.08
Controllable
net price -0.36 10.09 -0.89
# of partners 1.05 2.59 0.67
reputation score 1.98 3.55 1.72
# leads closed 0.27 31.99 2.12
Uncontrollable
lost major employer -25.77 0.09 -0.56
# competitors -0.22 9.55 -0.51
ESTIMATE MEAN
REPEAT UNITS 67.08
Controllable
net price -2.44 19.08 -0.69
# of emails sent -0.72 121.50 -1.31
# of direct mails sent 4.99 15.01 1.12
# sales assoc 6.55 12.97 1.27
remodel amount 0.0002 $115,208 0.34
Uncontrollable
med income 0.001 $68,055 1.01
size of household 2.28 2.09 0.07
satisfaction 6.88 6.70 0.69
ESTIMATE MEAN
SATISFACTION 6.70
Controllable
customer service 2.22 5.55 1.84
assoc engage 1.55 8.99 2.08
product assort 2.08 6.88 2.14
Uncontrollable
distance from store -0.04 5.55 -0.03

 

 

Table 2 presents a very different region. Again the idea is to note these regions vary and those variances give managers ways to improve their regions and the stores performance.

 

First visitors in region Y are insensitive to price, which is a very different finding than in region X. This finding means that instead of lowering price to increase revenue, this region should raise price to increase revenue. While reputation score and closed leads are again significant, the amount is very different then regions X. Lastly, the number of associates does not show up in the region’s model but instead partnerships are a significant variable.

 

In terms of uncontrollable variables number of competitors is significant as well as the loss of a major employer. While the firm can do nothing about these variables just noting their occurrence gives them items to be careful of and pay attention to.

 

The repeat visitors are very different in this region as well. While net price is significant, repeat (as with first time visitors) visitors are insensitive to price. There are similar findings in both regions in terms of number of emails and direct mail sent ands number of associates. However, this region is sensitive to the remodel amount rather than internal appearance, perhaps only a subtle difference but interesting in its own way.

 

Uncontrollable variables show up as median income and size of household. This region does not have the competitive pressures of region X. This has implications for enterprise optimization / subsidies, etc. Note also that while satisfaction is significate, it actually has an insensitive elasticity.

 

This regions satisfaction model has customer service, number of associates and product assortment under the firm’s control. Associate engagement was not found to be significant in region X.

 

For uncontrollable variables, distance from store shows up again as significant.

 

TABLE 3

 

REGION Y
REPEAT UNITS
STORE YOY% % PRICE CHANGE % CHANGE # EM SENT % CHANGE # DM SENT % CHANGE # ASSOC % CHANGE REMODEL $ MED INCOME INDXD SIZE HH INDXD % CHANGE SAT
11 +11% +2% -3% +0% +1% +4% 1.12 1.01 +0%
113 +9% +2% +1% +1% -2% +6% 1.24 0.98 +1%
209 +8% +1% -2% +3% +3% -1% 0.98 1.03 +2%
7 -2% +0% -2% +1% +2% +2% 0.77 0.89 -3%
18 -4% -3% +4% +4% -4% -3% 1.02 1.03 +0%
90 -5% -1% +4% +4% -2% +0% 0.68 0.99 +2%

 

 

 

Scorecard

 

Table 3 shows part of region Y’s scorecard. These are the top and bottom three store performers, measured by year over year percent growth. The whole point of the modeling was to find which variables are significant, believing that these would differ by region, and then look at how each store operated in terms of those variables.

 

That is, if price is important and say the region tends to operate on the inelastic side of demand, the appropriate strategy would be to increase price which should help increase revenue. Given that, each store’s operations can be ascertained (and ultimately guided) in terms of correct strategy by particular variables or metrics.

 

Thus, the top three performing stores all did tend to increase price. Notice that the bottom three performers moved price in the wrong way. In terms of number of emails, the more sent the more negative pressure is put on revenue and the bottom three performers tended to send more emails then the top three performers. Also the bottom three stores decreased their number of associates which tended to decrease units. While minor, increasing the amount spent on store upkeep, modernizing, etc. has a positive impact on revenue.

 

In terms of store operations, if the store is struggling during the year a common tactic would be to decrease price, and without the model management would not know what the appropriate action is. Also it may seem that sending out more emails would help counteract a sub-par year but in this case those are exactly the wrong actions. Likewise decreasing the dollars spent on associates and decreasing the amount spent on remodeling may appear to be a cost cutting measure but again those are exactly the wrong decisions. The bottom performers also sent out more direct mail and that is the correct response.

 

Thus the scorecard is intended to find which levers are impactful in each region and give store managers a tool to help optimize revenue. This can be in the form of a test and learn plan, or managing KPIs, etc.

 

The above referenced controllable variables, those levers that store management can change. Looking at the uncontrollable variables note that the top three performance tend to be over indexed on income and size of household whereas the bottom three tend to be less than average.

 

Taking a quick look at overall satisfaction shows the same trend: the top performers tended to increase their satisfaction score while the bottom performers tended to decrease their satisfaction score. Since the managers know form the model that customer services, associate engagement and product assort drives satisfaction in this regions, these metrics can be examined as a scorecard as well and a focus can help drive satisfaction.

 

 

 

Using the Store planning matrix

 

Note Figure 2 below, which shows the store planning matrix (SPM). Only the six stores mentioned in the scorecard are shown.

 

 

FIGURE 2 STORE PLANNING MATRIX

 

 

 

The SPM plots stores on two dimensions, economic area and revenue performance. Economic area can be defined as some combination of number of competitors, the changing of a large employer, income, household size, unemployment, etc. The idea is to create some dimension of how attractive is a particular trade area. The other dimension is YOY change in revenue.

 

The issue is how a store performs given the economic environment they find themselves in. As an obvious example look at store number 11 versus store number 18. They each, as shown, have similar economic operating areas but drastically different revenue performance. The SPM is a tool that gives managers an immediate way to understand which store is delivering given their operations. That is, store number 18 cannot claim that they can do no better because store number 11 is in the same environment and did perform much better. Then, using the store scorecard above, as mentioned, particular recovery plans can be put into place.

 

In terms of a strategic approach, the four quadrants of the SPM each have a specific goal. The top left might be to gain share. The top right might be to maximize and defend. The bottom left might be to manage for profit. The bottom right might be to manage for revenue. Plotting where a specific store lands in terms of its peers gives management a quickly relevant POV.

 

 

CONCLUSION

 

Same store sales modeling can be used to operationally predict future sales but the real power is to provide tools to understand what drives revenue. If, as is usually the case, these drivers are different in terms of a region (or a segment) then it becomes more critical to find how they differ. And more importantly, the ability to drill down to the store level, and find which stores are performing optimally, is critical for YOY success.

 

 

BIOGRAPHY

 

Mike Grigsby has been in marketing analytics for nearly three decades. He worked in CRM / database marketing at Dell, HP, Sprint, the Gap and is now a marketing science consultant at Targetbase. His PhD is in marketing science and he has taught marketing analytics at UTD, UD, and St. Edwards. He has published in both academic and trade journals and led seminars at DMA, NCDM, etc. He is the author of MARKETING ANALYTICS and his second book, ADVANCED CUSTOMER ANALYTICS, comes out October, 2016. Link to him on LinkedIn, follow on Twitter, or read the blog at marketingscience.biz.

 

 

 

USING CONSUMER BEHAVIOR FOR MARKETING STRATEGY

 

OK, in marketing the customer is king.  We all know that.  If marketing is not customer-centric it probably is NOT really marketing.  We all know that.

Or do we?

Why such a focus on competitive behavior?  I know John Nash just died and A Beautiful Mind was a great book (and a less than great movie) and Game Theory is very cool–but is it talked about in board rooms?  No.  I have never heard a CEO lean toward his CMO and ask, “Do you think our competition is doing prisoner’s dilemma?”  But a lot of attention is about competition to the distraction of focusing on consumer behavior.

I have set through many seminars and presentations on Game Theory and even been asked to teach a class on Game Theory.  While it seems important, and is certainty mathematically rigorous, what does it get us?  To me it functions more as an academic construct than an actionable insight.  Much like Michael Porter’s competitive intensity: have you ever used, or seen quantified, competitive rivalry?   Has there been a model quantifying the bargaining power of suppliers and buyers, the threat of substitutes and new entrants?  It functions as an abstract talking point, like debating the number of angels dancing on the head of a pin.

That’s why I posit a knowledge of customer behavior over a knowledge of Game Theory.  Indeed, I suggest that a knowledge of the analytics around customer behavior is a substitute for Game Theory.  I can hear the gasps.

Stephan Sorger’s excellent Marketing Analytics has a brief description of competitive moves, both offensive and defensive. Below are summaries of each move but applied via consumer behavior.  This can serve as a thumbnail sketch of what I have in mind

Defensive Reactions to Competitor Moves:

Bypass Attack (the attacking firm expands into one of our product areas) and the correct counter is for us to constantly explore new areas.  Remember Theodore Levitt’s Marketing Myopia? If not, re-read it, you know you had to in school.

 Encirclement Attack (the attacking firm tries to overpower us with larger forces) and the correct counter is to message how our products are superior / unique and of more value. This requires a constant monitoring of message effectiveness.

 Flank Attack (the attacking firm tries to exploit our weaknesses) and the correct counter is to not have any weaknesses. This again requires monitoring and messaging the uniqueness / value of our products.

Frontal Attack (the attacking firm aims at our strength) and the correct counter is to attack back in the firm’s territory. Obviously this is a rarely used technique.

Offensive Actions:

New Market Segments: this uses behavioral segmentation (see the latter chapters on segmentation) and incents consumer behavior for a win-win relationship.

Go-to-Market Approaches: this learns about consumer’s preferences in terms of bundling, channels, buying plans, etc.

Differentiating functionality: this approach extends consumer’s needs by offering product and purchase combinations most compelling to potential customers.

My book, Marketing Analytics (Kogan Page, 2015) offers additional analytic techniques to quantify the causality of customer behavior.

MARKETING ANALYTICS Press Release

For Immediate Release

NEW BOOK – Marketing Analytics: A Practical Guide to Real Marketing Science

New Book Reveals When Your Customers are Most Likely to Buy

Available today, Marketing Analytics arms business analysts and marketers with the understanding and techniques they need to solve real-world marketing problems, from testing campaign effectiveness and forecasting demand to employing survival analysis to determine when your customers are most likely to buy. It outlines everything practitioners need to ‘do’ marketing science by following fictional analyst Scott as he progresses through his career and makes increasingly better marketing decisions.

The author Mike Grigsby has been involved in marketing science for over 25 years. He was marketing research director at Millward Brown and has held leadership positions at Hewlett-Packard and the Gap. He now heads up the strategic retail analysis practice at Targetbase and is an adjunct professor at the University of Texas at Dallas.

Part of the new Marketing Science series by Kogan Page, which makes difficult topics accessible by grounding them in business reality, Marketing Analytics helps readers refine their marketing skills so they can compete more effectively in the marketplace. It provides insight into the power of data analytics in the context of marketing problems; explains and demonstrates marketing data modelling techniques in a practical way, illustrates how data modelling methodology can be applied to a range of practical scenarios and offers advice and step-by-step guidance for ways to solve some of the most common situations, opportunities and problems in marketing.

Dr. James Mourey, Assistant Professor of Marketing at DePaul University in Chicago, has offered advance praise, declaring, ‘For those MBAs who barely passed their quantitative marketing and statistics classes without truly understanding the content, Marketing Analytics provides everything managers and executives need to know presented as a conversation with examples to boot! You’ll definitely sound smarter in the boardroom after reading this book!’

For a review copy (ISBN 9780749474171), a by-lined article or to arrange an interview with the author, please contact Megan Mondi: mmondi@koganpage.com or +44 (0)20 7843 1952.

http://www.koganpage.com/product/marketing-analytics-9780749474171

http://www.amazon.com/Marketing-Analytics-Practical-Guide-Science/dp/0749474173/ref=sr_1_1?ie=UTF8&qid=1433361116&sr=8-1&keywords=grigsby

 

 

The Required Spiel on B-I-G D-A-T-A

INTRODUCTION

Okay, this had to be done.  It’s time.

I’ve avoided it because Big Data (yes, you have to capitalize it!) is everywhere.  You can’t get away from it.  It’s in every post and every update and every blog and every article and every book and every resume and every college class anywhere you look.  It’s inescapable.  Big Data has become the Kim Kardashian of analytics.

So now it’s time to add to the fray.

 

WHAT IS BIG DATA?

No one knows.  I’ll provide a working definition here but it will evolve over the years.

First, Big Data is BIG

Duh.  By “Big” I mean many many rows and many many columns.  Note that there is no magic threshold that suddenly puts us in the “Oh my, we are now in the Big Data range!”  It’s relative.

This brings us to the second and third dimension of what is Big Data: complexity.

Second, Big Data is potential multiple sources merged together

The dimension of Big Data came about because of the proliferation of multiple sources of data, both traditional and non-traditional.

So we have traditional data.  This means transactions from say a POS and marcomm responses.  This is what we’ve had for decades.  We also created our own data, things like time between purchases, discount rate, seasonality, click through rate, etc.

The next step was to add overlay data and marketing research data.  This was third-party demographics and / or lifestyle data merged to the customer file.  Marketing research responses could be merged to the customer file to provide things like satisfaction, awareness, competitive density, etc.

Then came the first wave of different data: web logs.  This was different and the first taste of Big Data.  It is another channel.  Merging it with customer data is a whole other process.

Now there is non-traditional data.  I’m talking about the merge-to-customer view.  IN terms of social media the merge to individual customers is a whole technology / platform issue.  But there are several companies who’ve developed technologies to scrape off the customer’s id: email, link, handle, tag, etc. and merge with other data sources.  This is key!  This is clearly a very different kind of data but it shows us say number of friends / connections, blog / post activity, sentiment, touch points, site visits, etc.

Third, Big Data is potential multiple structures merged together

Lastly Big Data has an element of degrees of structure.  I’m talking about the very common structured data through semi-structured and all the way to unstructured data.  Structured data is the traditional codes that are expected by type and length–it is uniform. Unstructured data is everything but that.  It can include text mining from say call records and free form comments, it can also include video and audio and graphics, etc.  Big Data gets us to structure this unstructured data.

Fourth, Big Data is analytically and strategically valuable

Just to be obvious: data that is not valuable can barely be called data.  It can be called clutter or noise or trash.  But it’s true that what is trash to me might be gold to you.  Take click stream data.  That URL has a lot of stuff in it.  To the analyst what is typically of value is the page the visitor came from and is going to, how long they were there, what they clicked on, etc.  Telling me what web browser they used or whether it’s an active server page or the time to load the wire frame (all probably critically important to some geek somewhere) is of little to no value to the analyst.  So Big Data can generate a lot of stuff but there has to be a (say text mining) technique / technology to put it in a form that can be consumed.  That’s what makes it valuable–not the quantity but the quality.

 

IS IT IMPORTANT?

Probably.  As alluded to above, what multiple data sources can provide the marketer is insights into consumer behavior.  It’s important to the extent that it provides more touch points of the shopping and purchasing process.  To know that one segment always looks at word of mouth opinions and blogs for the product in question is very important.  To know that another segment reads reviews and puts a lot of attention on negative sentiment can be invaluable for marketing strategy (and PR!)

Just like 20 years ago click stream data provided another view of shopping and purchasing, Big Data adds layers of complexity.  Because consumer behavior is complex, added granularity is a benefit.  Beware of “majoring on the minors” or paralysis of analysis.

 

WHAT DOES IT MEAN FOR ANALYTICS?  FOR STRATEGY?

There needs to be a theory: THIS causes THAT.  An insight has to be new and provide an explanation of causality and of a type that can be acted upon.  Otherwise (no matter how BIG it is) it is meaningless.  So the only value of Big Data is that it gives us a glimpse into the consumer’s mindset, it shows us their “path to purchase.”

For analytics this means a realm of attribution modelling that places weight on each touch point, by behavioral segment.  Strategically, from a portfolio POV, it tells us that this touch point is of value to shoppers / purchasers and this one is NOT.  Therefore attention needs to be paid to those (pages, sites, networks, groups, communities, stores, blogs, influencers, etc.) touch points that are important to consumers.  The biggest difference that Big Data gives us is that now we have more things to look at, more complexity, and this cannot be ignored.  To pretend consumers do not travel down that path is to be foolishly simplistic.  When a three dimensional globe is forced into two dimensional (from a sphere to a wall) space, Greenland looks to be the size of Africa.  The over simplification created distortion. Same is true of consumer behavior.  The tip of the iceberg that we see is motivated by many unseen, below the surface, causes.

 

CONCLUSION

Big Data is not going to go away.  Like the Borg, we will assimilate it, we will add its technological uniqueness to our own.  We will be better for it.

The new data does not require new analytic techniques.  The new data does not require new marketing strategies.  Marketing is still marketing and understanding and incenting and changing consumer behavior is still what marketers do.  Now–as always–size does matter, and we have more.  Enjoy!

 

WHERE DOES SEGMENTATION START?

 

So I was in a meeting the other day and a retail client said they wanted to do segmentation.  Now, those who know me know that that is what I LOVE to do.  I think that is often a good first step.  It is the foundation of much analytics that follow.  Remember the 4 Ps of strategic marketing?  Partition (segmentation), probe (marketing research), prioritize (rank financially) and position (compelling messaging).  Strategy starts with segmentation.

They began talking about what data they have available.  But that is NOT the right place to start.  Segmentation is a strategic, not an analytic, exercise.  Surprised to hear me say that?  Note that while segmentation is the first step in strategic marketing, (see above) it is PART of strategic marketing.  That is, it starts with strategy.

Where does strategy start?  It starts with clearly defined objectives.  For segmentation to work it must start with strategy and strategy starts with clearly defined objectives.

So I asked the client what is it they wanted to do.

“Sell more stuff, man!  Make money.”  Duh.

Yeah, I get that.  Have you thought about HOW you are going to sell more stuff?  How are you going to make more money?

Oh.

Sure, I can take all their data (demographics, transactions, attitudes / lifestyle, loyalty, marcom, etc.) and throw it into some algorithm–I like latent class myself–and out will pop a statistically valid (within the confines of the algorithm) segmentation solution.  That will be acceptable analytically, but: IT WON’T WORK.  It does not solve anything, it does not give levers for a solution because the solution was not inherent in the design.

For example, a recent telecom client had a problem with churn (attrition).  They needed a list of who is most likely to churn in the next 60 days so they could intervene and try to slow down / stop the churn.

The solution was to segment based on reasons to churn.  We brainstormed about what causes churn–high bills, high uses of data / minutes, dropped calls, etc..  Then we collected data on the causes of churn and segmented based on that data.  We came up with a segment that was sensitive to price and churned because of high bills.  Another segment was sensitive to dropped calls and churned because of an increase in dropped calls.  Then survival modeling was applied to each segment and we could produce a list of those most likely to churn and WHY they would churn.  This WHY gave the client a marketing lever to use in combating churn.  For the “sensitive to high bill” segment, those most at risk could be offered a discount.  (If a $5 discount keeps a subscriber on the system for 60 more days, it’s worth it.)  Note that the solution had marketing actions in the design.  That’s why it worked.

We did not segment based on demographics.  We did not segment based on attitudes.  But we could have.  The algorithm does not know (or care) what the data is.  The mathematics around the solution have nothing to do with what variables are used.  Analytically, a solution is a solution.  But without marketing strategy as part of the design it will not work.

So for the retail client, there was a conversation.  Segmentation is NOT a magic bullet that will solve all marketing problems.  But thinking will help.

So the retail client admitted they were probably discounting too much (all retailers discount too much) but they did not know how to target their discounting.  Clearly some of their customers would not buy without a discount, but some where more loyal and did not really need a discount to buy, etc.  So one way to make more money is to not give such high discounts.  Thus that is a marketing strategy.  If we could find groups that differ on price sensitivity, we could segment base don that.  One segment needs a discount and another segment does not.

Another way to make more money is to save on direct mail.  Some customers preferred a catalog and others did not care and were happy with email.  Direct mail is expensive so if segmentation could be done to find a group that required direct mail and to find a group that did not, clearly send a catalog to the DM group but send the email group an email.  See?

Note again that demographics, attitudes, loyalty metrics, etc. were not part of the solution because they were not part of the problem.  There could be a strategy that needs a segmentation based on loyalty, etc. but not in the current example.

So the key take away is that segmentation does NOT start with data, it starts with thinking about objectives, what marketing levers can be pulled, what problem is (specifically) being solved.  Without that you have nothing and “He who aims at nothing will hit it.”