Category Archives: ANALYTICS

an overview of analytic techniques

The Required Spiel on B-I-G D-A-T-A


Okay, this had to be done.  It’s time.

I’ve avoided it because Big Data (yes, you have to capitalize it!) is everywhere.  You can’t get away from it.  It’s in every post and every update and every blog and every article and every book and every resume and every college class anywhere you look.  It’s inescapable.  Big Data has become the Kim Kardashian of analytics.

So now it’s time to add to the fray.



No one knows.  I’ll provide a working definition here but it will evolve over the years.

First, Big Data is BIG

Duh.  By “Big” I mean many many rows and many many columns.  Note that there is no magic threshold that suddenly puts us in the “Oh my, we are now in the Big Data range!”  It’s relative.

This brings us to the second and third dimension of what is Big Data: complexity.

Second, Big Data is potential multiple sources merged together

The dimension of Big Data came about because of the proliferation of multiple sources of data, both traditional and non-traditional.

So we have traditional data.  This means transactions from say a POS and marcomm responses.  This is what we’ve had for decades.  We also created our own data, things like time between purchases, discount rate, seasonality, click through rate, etc.

The next step was to add overlay data and marketing research data.  This was third-party demographics and / or lifestyle data merged to the customer file.  Marketing research responses could be merged to the customer file to provide things like satisfaction, awareness, competitive density, etc.

Then came the first wave of different data: web logs.  This was different and the first taste of Big Data.  It is another channel.  Merging it with customer data is a whole other process.

Now there is non-traditional data.  I’m talking about the merge-to-customer view.  IN terms of social media the merge to individual customers is a whole technology / platform issue.  But there are several companies who’ve developed technologies to scrape off the customer’s id: email, link, handle, tag, etc. and merge with other data sources.  This is key!  This is clearly a very different kind of data but it shows us say number of friends / connections, blog / post activity, sentiment, touch points, site visits, etc.

Third, Big Data is potential multiple structures merged together

Lastly Big Data has an element of degrees of structure.  I’m talking about the very common structured data through semi-structured and all the way to unstructured data.  Structured data is the traditional codes that are expected by type and length–it is uniform. Unstructured data is everything but that.  It can include text mining from say call records and free form comments, it can also include video and audio and graphics, etc.  Big Data gets us to structure this unstructured data.

Fourth, Big Data is analytically and strategically valuable

Just to be obvious: data that is not valuable can barely be called data.  It can be called clutter or noise or trash.  But it’s true that what is trash to me might be gold to you.  Take click stream data.  That URL has a lot of stuff in it.  To the analyst what is typically of value is the page the visitor came from and is going to, how long they were there, what they clicked on, etc.  Telling me what web browser they used or whether it’s an active server page or the time to load the wire frame (all probably critically important to some geek somewhere) is of little to no value to the analyst.  So Big Data can generate a lot of stuff but there has to be a (say text mining) technique / technology to put it in a form that can be consumed.  That’s what makes it valuable–not the quantity but the quality.



Probably.  As alluded to above, what multiple data sources can provide the marketer is insights into consumer behavior.  It’s important to the extent that it provides more touch points of the shopping and purchasing process.  To know that one segment always looks at word of mouth opinions and blogs for the product in question is very important.  To know that another segment reads reviews and puts a lot of attention on negative sentiment can be invaluable for marketing strategy (and PR!)

Just like 20 years ago click stream data provided another view of shopping and purchasing, Big Data adds layers of complexity.  Because consumer behavior is complex, added granularity is a benefit.  Beware of “majoring on the minors” or paralysis of analysis.



There needs to be a theory: THIS causes THAT.  An insight has to be new and provide an explanation of causality and of a type that can be acted upon.  Otherwise (no matter how BIG it is) it is meaningless.  So the only value of Big Data is that it gives us a glimpse into the consumer’s mindset, it shows us their “path to purchase.”

For analytics this means a realm of attribution modelling that places weight on each touch point, by behavioral segment.  Strategically, from a portfolio POV, it tells us that this touch point is of value to shoppers / purchasers and this one is NOT.  Therefore attention needs to be paid to those (pages, sites, networks, groups, communities, stores, blogs, influencers, etc.) touch points that are important to consumers.  The biggest difference that Big Data gives us is that now we have more things to look at, more complexity, and this cannot be ignored.  To pretend consumers do not travel down that path is to be foolishly simplistic.  When a three dimensional globe is forced into two dimensional (from a sphere to a wall) space, Greenland looks to be the size of Africa.  The over simplification created distortion. Same is true of consumer behavior.  The tip of the iceberg that we see is motivated by many unseen, below the surface, causes.



Big Data is not going to go away.  Like the Borg, we will assimilate it, we will add its technological uniqueness to our own.  We will be better for it.

The new data does not require new analytic techniques.  The new data does not require new marketing strategies.  Marketing is still marketing and understanding and incenting and changing consumer behavior is still what marketers do.  Now–as always–size does matter, and we have more.  Enjoy!




So I was in a meeting the other day and a retail client said they wanted to do segmentation.  Now, those who know me know that that is what I LOVE to do.  I think that is often a good first step.  It is the foundation of much analytics that follow.  Remember the 4 Ps of strategic marketing?  Partition (segmentation), probe (marketing research), prioritize (rank financially) and position (compelling messaging).  Strategy starts with segmentation.

They began talking about what data they have available.  But that is NOT the right place to start.  Segmentation is a strategic, not an analytic, exercise.  Surprised to hear me say that?  Note that while segmentation is the first step in strategic marketing, (see above) it is PART of strategic marketing.  That is, it starts with strategy.

Where does strategy start?  It starts with clearly defined objectives.  For segmentation to work it must start with strategy and strategy starts with clearly defined objectives.

So I asked the client what is it they wanted to do.

“Sell more stuff, man!  Make money.”  Duh.

Yeah, I get that.  Have you thought about HOW you are going to sell more stuff?  How are you going to make more money?


Sure, I can take all their data (demographics, transactions, attitudes / lifestyle, loyalty, marcom, etc.) and throw it into some algorithm–I like latent class myself–and out will pop a statistically valid (within the confines of the algorithm) segmentation solution.  That will be acceptable analytically, but: IT WON’T WORK.  It does not solve anything, it does not give levers for a solution because the solution was not inherent in the design.

For example, a recent telecom client had a problem with churn (attrition).  They needed a list of who is most likely to churn in the next 60 days so they could intervene and try to slow down / stop the churn.

The solution was to segment based on reasons to churn.  We brainstormed about what causes churn–high bills, high uses of data / minutes, dropped calls, etc..  Then we collected data on the causes of churn and segmented based on that data.  We came up with a segment that was sensitive to price and churned because of high bills.  Another segment was sensitive to dropped calls and churned because of an increase in dropped calls.  Then survival modeling was applied to each segment and we could produce a list of those most likely to churn and WHY they would churn.  This WHY gave the client a marketing lever to use in combating churn.  For the “sensitive to high bill” segment, those most at risk could be offered a discount.  (If a $5 discount keeps a subscriber on the system for 60 more days, it’s worth it.)  Note that the solution had marketing actions in the design.  That’s why it worked.

We did not segment based on demographics.  We did not segment based on attitudes.  But we could have.  The algorithm does not know (or care) what the data is.  The mathematics around the solution have nothing to do with what variables are used.  Analytically, a solution is a solution.  But without marketing strategy as part of the design it will not work.

So for the retail client, there was a conversation.  Segmentation is NOT a magic bullet that will solve all marketing problems.  But thinking will help.

So the retail client admitted they were probably discounting too much (all retailers discount too much) but they did not know how to target their discounting.  Clearly some of their customers would not buy without a discount, but some where more loyal and did not really need a discount to buy, etc.  So one way to make more money is to not give such high discounts.  Thus that is a marketing strategy.  If we could find groups that differ on price sensitivity, we could segment base don that.  One segment needs a discount and another segment does not.

Another way to make more money is to save on direct mail.  Some customers preferred a catalog and others did not care and were happy with email.  Direct mail is expensive so if segmentation could be done to find a group that required direct mail and to find a group that did not, clearly send a catalog to the DM group but send the email group an email.  See?

Note again that demographics, attitudes, loyalty metrics, etc. were not part of the solution because they were not part of the problem.  There could be a strategy that needs a segmentation based on loyalty, etc. but not in the current example.

So the key take away is that segmentation does NOT start with data, it starts with thinking about objectives, what marketing levers can be pulled, what problem is (specifically) being solved.  Without that you have nothing and “He who aims at nothing will hit it.”












Because pricing is one of the Four Ps of Marketing (product, price, promotion and place) it is critically important in understanding consumer behavior.  Pricing is where marketing happens, because a market is the place where buyers meet sellers.

However, how do marketers know what price to charge?  Part of it has to do with what strategy (skimming, penetrating, other) are they pursuing.  But in general, too high a price and they get no sales, too low a price and they get no profit.  Typically the market is the LAST place to experiment with trail and error.

So there are several generally accepted ways to research the “right” price to charge.  A practical decision is whether or not the product exists, real in the market place.  If it is a new product then elasticity modeling cannot really be done.  If it exists then there are four choices: a general survey, a van Westendorp survey, conjoint analysis and elasticity modeling.

This post favors (for an existing product) elasticity modeling because it is real responses to real price changes in an economic environment.  If it is a new product the least favorable choice is a general survey.  Each of these methods will be briefly discussed.



The first and simplest solution is to ask customers what they think.  This requires taking a random sample and asking “Are our prices too high?”

There is a lot of thought put behind this, to seek granularity, but generally marketers want to know if customers think their prices are too high.  The probing can be /should be aimed at different segments (high volume or low volume users, new or established customers, a particular product, geographically disbursed, etc.)

But the overwhelming answers customers give to the question “Are our prices too high” is “Yes!  Your prices are too high.”  It is self- reported and self-serving.  Money was wasted on a survey.  So what usually happens in a large corporation is that many creative people will try to slice-and-dice until they find some answers they want, some “segments” or cohorts, etc., where prices are reported as NOT too high.  That is, if you look at say customers who have been on the database longer than three years that have bought more than $450 of product X who reside in the northeast reported that, “No, your prices are competitive”!”  This is just window dressing and not analytic.

Thus, for an existing product, a general survey among current customers offers no real insights.

This post advocates NOT using a survey for an existing product.  The only choice is to use a survey for a new product.  This too has pitfalls.  Remember Chrysler used marketing research and asked potential customers how likely they would be to buy a minivan, a very new concept.  Thee customers had no experience with it and indicated lackluster demand.  Iacocca ignored the research and built the minivan anyway, saving the company.  So in short, a general marketing research survey has little value except as gee whiz info.



A second common option is the van Westendorp survey.  (Those who use it do not call it a survey but a Price Sensitivity Analysis (PSA)).  But it really is a survey.  It takes a random sample of customers and asks them questions.  It is usually a “tracking” study, so as to gauge price sensitivity movement over time.  In general the point is to find out what prices are considered too high or too low (again, self-reported).  These results are graphed onto a “Price Map”.

Customers are asked four questions:

  • At what price would you consider the product/service to be priced so low that you feel that the quality can’t be very good?
  • At what price would you consider this product/service to be a bargain—a great buy for the money?
  • At what price would you say this product/service is starting to get expensive—it’s not out of the question, but you’d have to give some thought to buying it?
  • At what price would you consider the product/service to be so expensive that you would not consider buying it?


Usually question 1 (too cheap) and question 4 (too expensive) are the primary graphs.  The intersection of these two are meant to reveal the optimal price, in the below case about $21.


too cheap too expensive price
100% 0% $5
100% 0% $6
100% 0% $7
100% 0% $8
100% 0% $9
100% 6% $10
100% 6% $11
100% 6% $12
100% 13% $13
100% 19% $14
94% 44% $15
88% 50% $16
81% 56% $18
69% 63% $19
56% 69% $20
44% 75% $21
38% 81% $22
25% 88% $25
19% 94% $30
13% 100% $35



The “optimal” price is a bit debatable.  The idea though is that about 65% think the price of $21 is too cheap and 65% think the price of $21 is too expensive.  That is, to extract maximum value from customers, $21 is seen as simultaneously too cheap and too expensive, hence the “optimal” price.

Conjoint (considered jointly) is a powerful technique favored primarily by marketing researchers.  There are dozens of books detailing all the cool types and techniques of conjoint.

To elaborate the last point, conjoint serves an important purpose, especially in marketing research, especially in product design (before the product is introduced).  My main problem with surveys overall is that they are self-reported and artificial.  Conjoint sets up a contrived situation for each respondent (customer) and asks them to make choices.  The customer makes choices and these choices are typically in terms of purchasing a product.  You know I’m an econ guy and these customers are not really purchasing.  They are not weighing real choices.  They are not using their own money.  They are not buying products in a real economic arena.  The artificialness is why I do not advocate conjoint for much else other then new product design.  That is, if you have real data use it, if you need (potential) customer’s input in designing a new product use conjoint for that.  Also, please recognize that conjoint analysis is not actually an “analysis” (like regression, etc.) but a framework for parsing out simultaneous choices.  Conjoint means “considered jointly”.

The general process of conjoint is to design choices, depending on what is being studied.  Marketing researchers are trying to understand what attributes (independent variables) are more / less important in terms of customers purchasing a product.  So a collection of experiments is designed to ask customers how they’d rate (how likely they would be to purchase) given varying product attributes.

In terms of say PC manufacturing, choice 1 might be: $800 PC, 17 inch monitor, 1 Gig hard drive, 1 Gig RAM, etc.  Choice 2 might be: $850 PC, 19 inch monitor, 1 Gig hard drive, 1 Gig RAM, etc.  There are enough choices designed to show each customer in order to calculate “part-worths” that show how much they value different product attributes.  This is supposed to give marketers and product designers an indication of market size and optimal design for the new product.

Note that it is important to design the types and number of levels of each attribute so that the independent variables are orthogonal (not correlated) to each other.  These choice design characteristics are critical to the process.  At the end an ordinary regression is used to optimally calculate the value of part-worths.  It is this estimated value that makes conjoint strategically useful.

Note that the idea is to present to responders choices (in such a way that they are random and orthogonal) and the responders rank these choices.  The choice rankings are a responder’s judgment about the “value” (economists call it utility) of the product or service evaluated.  It is assumed that this total value is broken down into the attributes that make up the choices.  These attributes are the independent variables and these are the part-worths of the model.  That is:

where Ui = total worth for product / service and

X11 = part-worth estimate for level 1 of attribute 1

X12  = part-worth estimate for level 1 of attribute 2

X21 = part-worth estimate for level 2 of attribute 1

X22 = part-worth estimate for level 2 of attribute 2

Xmn = part-worth estimate for level m of attribute n.


Conjoint is not appropriate in the way usually used, especially in terms of pricing, except, as mentioned, in a new product–a product where there is no real data.  For an existing product, it is possible to design a conjoint analysis and put price levels in as choice variables.  Marketing researchers tell me that this price variable derives an elasticity function.  I disagree for the following reasons: 1) those estimates are NOT real economic data.  They are contrived and artificial.  2) The size of the sample that is derived from is too small to make real corporate strategic choices.  3) The data is self-reported.  Those respondents are not responding with their own money in a real economic area purchasing real products.  4) Using real data is far superior to using conjoint data.  Have I said this enough yet?  Ok, the rant will now stop.



Let’s go back to microeconomics 101: price elasticity is the metric that measures the change in an output variable (typically units) from a change, in this case price, from an input variable.  This change is usually calculated as a “pure number” without dimensions.  It is a marginal function over and average function, that is:

mathematically dQ/dP * P / Q or statistically β * P / Q

where P / Q are average price over average units.

If the change is > |1.00|, that demand is called elastic.  If it is < |1.00|, that demand is called inelastic.  These are unfortunate terms, as they nearly hide the real meaning.  The clear concept is one of sensitivity.  That is, how sensitive are customers who purchase units to a change in price?  If there is a 10% increase in price and customers respond by purchasing < 10 % units, they are clearly insensitive to price.  If there is a 10 % increase in price and customers respond by purchasing > 10 % units, they are sensitive to price.

But this is not the key point, at least in terms of marketing strategy.  The law of demand is that price and units are inversely correlated (remember the downward sloping demand curve?)  Units will always go the opposite direction of a price change.  But the real issue is what happens to revenue.  Since revenue is price * units, if demand is inelastic, revenue will follow the price direction.  If demand is elastic revenue will follow unit direction.  Thus, to increase revenue in an inelastic demand curve, price should increase.  To increase revenue in an elastic demand curve, price should decrease.



INELAST 0.075 increase price by 10.0%
p1 $10.00 p2 $11.00 10.0%
u1 1,000 u2 993 -0.75%
tr1 $10,000 tr2 $10,918 9.2%
ELAST 1.250 increase price by 10.0%
p1 $10.00 p2 $11.00 10.0%
u1 1,000 u2 875 -12.50%
tr1 $10,000 tr2 $9,625 -3.8%


See the table above.  There are two kinds of demand: inelastic (0.075) and elastic (1.250).  In the inelastic portion, we increase price (p1)  10% from $10.00 to $11.00 (p2).  Units decrease (because of the law of demand) from u1 at 1,000 units to 993 (note a .75% decrease.)  Now see total revenue tr1 goes from $10,000 ($10.00 * 1,000) to tr2 $10,918 ($11.00 * 993).    This was an inelastic demand curve and price increased and note that while units decreased total revenue increased.

Now the  opposite happens for an elastic demand curve.  p1 = $10.00 and p2 = $11.00 but while u1 starts at 1,000 units the 12.5% decrease sends u2 to 875.  Now see that tr1 goes from $10,000 to a decrease of $9,625.  This means that in order to raise TR prices must be increased in an inelastic demand curve but decreased in an elastic demand curve.  This means that a marketer does not know which way to move price unless they do elasticity modeling.  See?  Wasn’t that fun?

A quick note on a mathematically correct but practically incorrect concept: modeling elasticity in logs.  While it’s true that if the natural log is taken both of the demand and price, there is no calculation at the means, the beta coefficient is the elasticity.  However, and this is important, running a model in natural logs also implies a very wrong assumption: constant elasticity.  This means there is the same impact at a small price change as at a large price change and no marketer believes that.  Thus, modeling in natural logs is never recommended.  No other analytic technique gives these insights except elasticity modeling.



A couple of obvious points: I would clearly recommend real data used on real customers responding to real price changes.  This is the operating economic environment.  That is, for an existing product, use the database of customer’s behavior in purchasing products.  The strategic insights this generates will help save margin and increase total revenue.  For a new product a general survey is the worst choice, incorporate either conjoint or van Westerdorp survey.

Price sensitivity is a key concept in economics and marketing.  Elasticity modeling is hardly ever done, but it should be investigated more often.  The strategic insights gathered from elasticity modeling are worth that investigation.




Life-Time Value is typically done as just a calculation, using past (historical) data.  That is, it’s only descriptive.

While there are many versions of LTV (depending on data, industry, interest, etc.) the following is conceptually applied to all.  LTV, via descriptive analysis:

1)Uses historical data to sum up each customer’s total revenue.

2)This sum then has subtracted from it some costs: typically cost to serve, cost to market, maybe cost of goods sold, etc.

3)This net revenue is then converted into an annual average amount and depicted as a cash flow.

4)These cash flows are assumed to continue into the future and diminish over time (depending on durability, sales cycle, etc.) often decreasing arbitrarily by say 10% each year until they are effectively zero.

5)These (future, diminished) cash flows are then summed up and discounted (usually by Weighted Average Cost of Capital) to get their net present value.

6)This NPV is called LTV.  This calculation is applied to each customer.

Thus each customer has a value associated with it.  The typical use is for marketers to find the “high valued” customers (based on past purchases).  These high valued customers get most of the communications, promotions / discounts, marketing efforts, etc.  Descriptive analysis is merely about targeting those already engaged (much like RFM).

This seems to be a good starting point but, as is usual with descriptive analysis, contributes nothing about WHY.  Why is one customer more valuable, will they continue to be?  Is it possible to extract additional value, but at what cost?  Is it possible to garner more revenue from a lower valued customer because they are more loyal or cost less to serve?  What part of the marketing mix is each customer most sensitive to?  LTV (as described above) gives no implications for strategy.  The only strategy is to offer and promote to the high valued customers.



How would LTV change using predictive analysis instead of descriptive analysis?  First note that while LTV is a future-oriented metric, descriptive analysis uses historical (past) data and the entire metric is built on that, with assumptions about the future applied unilaterally to every customer.  Prediction will specifically thrust LTV into the future (where it belongs) by using independent variables to predict the next time until purchase.  Since the major customer behavior driving LTV is timing, amount and number of purchases, a statistical technique needs to be used that predicts time until an event.  (Ordinary regression predicting the LTV amount ignores timing and number of purchases.)

Survival analysis is a technique designed specifically to study time until event problems.  It has timing built into it and thus a future view is already embedded in the algorithm.  This removes much of the arbitrariness of typical (descriptive) LTV calculations.

So, what about using survival analysis to see which independent variables, say, bring in a purchase?  This decreasing time until purchase tends to increase LTV.  While survival analysis can predict the next time until purchase, the strategic value of survival analysis is in using the independent variables to CHANGE the timing of purchases.  That is, descriptive analysis shows what happened; predictive analysis gives a glimpse of what might CHANGE the future.

Strategy using LTV dictates understanding the causes of customer value: why a customer purchases, what increases / decreases the time until purchase, probability of purchasing at future times, etc.  Then when these insights are learned, marketing levers (shown as independent variables) are exploited to extract additional value from each customer.  This means knowing that one customer is say sensitive to price and that a discount will tend to decrease their time until purchase.  That is, they will purchase sooner (maybe purchase larger total amounts and maybe purchase more often) with a discount.  Another customer prefers say product X and product Y bundled together to increase the probability of purchase and this bundling decreases their time until purchase.  This insight allows different strategies for different customer needs and sensitivities, etc.  Survival analysis applied to each customer yields insights to understand and incent changes in behavior.

This means just assuming the past behavior will continue into the future (as descriptive analysis does) with no idea why, is no longer necessary.  It’s possible for descriptive and predictive analysis to give contradictory answers.  Which is why “crawling” might be detrimental to “walking”.

If a firm can get a customer to purchase sooner, there is an increased chance of adding purchases–depending on the product.  But even if the number of purchases is not increased, the firm getting revenue sooner will add to their financial value (time is money).

Also a business case can be created by showing the trade-off in giving up say margin but obtaining revenue faster.  This means strategy can revolve around maximization of cost balanced against customer value.

The idea is to model next time until purchase, the baseline, and see how to improve that.  How is this carried out?  A behaviorally-based method would be to segment the customers (based on behavior) and apply a survival model to each segment and score each individual customer.  By behavior is typically meant purchasing (amount, timing, share of products, etc.) metrics and marcom (open and click, direct mail coupons, etc.) responses.



Let’s use an example.  Table 1 shows two customers from two different behavioral segments.  Customer XXX purchases every 88 days with an annual revenue of $43,958, costs of $7,296 for a net revenue of $36,662.  Say the second year is exactly the same.  So year 1 discounted at 9% is NPV of $33,635 and year 2 discounted at 9% for two years is $30,857 for a total LTV of $64,492.  Customer YYY has similar calculations for LTV of $87,898.

XXX 88 4.148 $43,958 $7,296 $36,662 $36,662 $33,635 $30,857 $64,492
YYY 58 6.293 $62,289 $12,322 $49,967 $49,967   $45,842 $42,056 $87,898


The above (using descriptive analysis) would have marketers targeting customer YYY with > $23,000 value over customer XXX.  But do we know anything about WHY customer XXX is so lower valued?  Is there anything that can be done to make them higher valued?

Applying a survival model to each segment outputs independent variables and shows their effect on the dependent variable.  In this case the dependent variable is (average) time until purchase.  Say the independent variables (which defined the behavioral segments) are things like price discounts, product bundling, seasonal messages, adding additional direct mail catalogs, offering online exclusives, etc.  The segmentation should separate customers based on behavior and the survival models should show how different levels of independent variables drive different strategies.

Table 2 below shows results of survival modeling on the two different customers that come from two different segments.  The independent variables are price discounts 10%, product bundling, etc.  The TTE is time until event and shows what happens to time until purchase based on changing one of the independent variable.  For example, for customer XXX, giving a price discount of 10% on average decreases their time until purchase by 14 days.  Giving YYY a 10% discounts decreases their time until purchase by only 2 days.  This means XXX is far more sensitive to price then YYY–which would not be known by descriptive analysis alone. Likewise giving XXX more direct mail catalogs pushes out their TTE but pulls in YYY by 2 days.  Note also that very little of the marketing levers affect YYY very much.  We are already getting nearly all from YYY that we can, no marketing effort does very much to impact the TTE.  However, with XXX there are several things that can be done to bring in their purchases.  Again, none of these would be known without survival modeling on each behavioral segment.


  xxx yyy
price discount 10% -14 -2
product bundling  -4 12
seasonal message   6 21
5 more catalogs  11 -2
online exclusive -11  3


Table 3 below shows new LTV calculations on XXX after using survival modeling results.  We decreased TTE by 24 days, by using some combinations of discounts and bundling and online exclusives, etc.  Note now the LTV for XXX (after using predictive analysis) is greater than YYY.


XXX 64 5.703 $60,442 $10,032 $50,410 $50,410 $33,635 $30,857 $88,677
YYY 58 6.293 $62,289 $12,322 $49,967 $49,967   $45,842 $42,056 $87,898


What survival analysis offers, in addition to marketing strategy levers, is a financial optimal scenario, particularly in terms of costs to market.  That is, customer XXX responds to a discount.  It’s possible to calculate and test what is the (just) needed threshold of discounts to bring a purchase in by so many days with the estimated level of revenue.  This ends up being a cost / benefit analysis that makes marketers think about strategy.  This is the advantage of predicative analysis–giving marketers strategic options.




What is a Market Basket?

In economics, a market basket is a fixed collection of items that consumers buy.  This is used for metrics like CPI (inflation) etc.  In marketing, a market basket is any 2 or more items bought together.

Market basket analysis is used, especially in retail / CPG, to bundle and offer promotions and gain insight in shopping / purchasing patterns.  “Market basket analysis” does not, by itself, describe HOW the analysis is done.  That is, there is no associated technique with those words.

How is it usually done?

There are three general uses of data: descriptive, predictive and prescriptive.  Descriptive is about the past, predictive uses statistical analysis to calculate a change on an output variable (e.g., sales) given a change in an input variable (say, price) and prescriptive is a system that tries to optimize some metric (typically profit, etc.)  Descriptive data (means, frequencies, KPIs, etc.) is a necessary but not usually a sufficient step.  Always get to at least the predictive step as soon as possible.  Note that predictive here does not necessarily mean forecast-ed into the future.  Structural analysis uses models to simulate the market, and estimate (predict) what causes what to happen.  That is, using regression, given a change in price what is the estimated (predicted) change in sales.

Market basket analysis often uses descriptive techniques.  Sometimes it is just a “report” of what percent of items are purchased together.  Affinity analysis (a step above) is mathematical, not statistical.  Affinity analysis simply calculates the percent of time combinations of products are purchased together.  Obviously there is no probability involved.  It is concerned with the rate of products purchased together, and not with a distribution around that association.  It is very common and very useful but NOT predictive–therefore NOT so actionable.

Logistic Regression

Let’s talk about logistic regression.  This is an ancient and well known statistical technique, probably the analytic pillar upon which database marketing has been built.  It is similar to ordinary regression in that there is a dependent variable that depends on one or more independent variables.  There is a coefficient (although interpretation is not the same) and there is a (type of) t-test around each independent variable for significance.

The differences are that the dependent variable is binary in logistic and continuous in ordinary regression and to interpret the coefficients requires exponentiation.  Because the dependent variable is binary, the result is heteroskedasticity.  There is no (real) R2, and “fit” is about classification.

How to Estimate / Predict the Market Basket

The use of logistic regression in terms of market basket becomes obvious when it is understood that the predicted dependent variable is a probability.  The formula to estimate probability from logistic regression is:

P(i) = 1 / 1+ e –Z

where Z = α + βXi.  This means that the independent variables can be products purchased in a market basket to predict likelihood to purchase another product as the dependent variable.   The above means specifically take each (major) category of product (focus driven by strategy) and running a separate model for each, putting in all significant other products as independent variables.  For example, say we have only three products, x, y and z.  The idea is to design three models and test significance of each.  Meaning using logistic regression:

x = f(y,z)

y = f(x,z)

z = f(x,y).

Of course other variable can go into the model as appropriate but the interest is whether or not the independent (product) variables are significant in predicating the probability of purchasing the dependent product variable.  Of course, after significance is achieved, the insights generated are around the sign of the independent variable, i.e., does the independent product increase or decrease the probability of purchasing the dependent product.

An Example

As a simple example, say we are analyzing a retail store, with categories of products like consumer electronics, women’s accessories, newborn and infant items, etc.  Thus, using logistic regression, a series of models should be run.  That is,


This means the independent variables are binary, coded as a “1” if the customer bought that category and a “0” if not.  The table below details the output for all of the models.  Note that other independent variables can be included in the model, if significant.  These would often be seasonality, consumer confidence, promotions sent, etc.

To interpret, look at say home décor model.  If a customer bought consumer electronics, that increases the probability of buying home décor by 29%.  If a customer bought newborn / infant items, that decreases the probability of buying home décor by 37%.  If a customer bought furniture, that increases the probability of buying home décor by 121%.  This has implications


CONSUMER ELECTRONICS XXX Insig Insig -23% 34% 26% 98%
WOMEN’S ACCESSOR Insig XXX 39% 68% 22% 21% Insig
NEWBORN, INFANT,ETC. Insig 43% XXX -11% -21% -31% 29%
JEWELRY, WATCHES -29% 71% -22% XXX 12% 24% -11%
FURNITURE 31% 18% -17% 9% XXX 115% 37%
HOME DÉCOR 29% 24% -37% 21% 121% XXX 31%
ENTERTAIN 85% Insig 31% -9% 41% 29% XXX


especially for bundling and messaging.  That is, offering say home décor and furniture together makes great sense, but offering home décor and newborn / infant items does not make sense.


The above detailed a simple (and more powerful way) to do market basket analysis.  If given a choice, always go beyond mere descriptive techniques and apply predictive techniques.

See my MARKETING ANALYTICS for additional details:






How Do You Know if You’re “Analytic”?

Okay, since some of this blog is aimed at students of analytics, how do you know if YOU are analytic?  Sure, sure , you’ve been pushed by your parents into taking a lot of math and science, etc., and are now in school studying analytics–but deep down, sometimes at night, you wonder if it is really for you.  It’s not about how much money you might be able to make, you sometimes wonder if you should change your major to something fun and interesting, maybe music or art or politics or history.

Or, you’re already working IN analytics and also question if you’ve made the right decision.  Do you fit in?  Can you be successful?  You’re early in your career and it’s not too late.  How does the prospect of doing SAS on dirty data and searching for insights with no time for the next 30 years sound?  If your heart skipped a beat, you should worry.


How do you know if you’re an analytic person?  You should love the simple joy that comes when seeing a variable that should be significant, be proved in the data.  The satisfied look of wonder pervades your face when the world makes sense.  That replaces the constant, cynical caveat-laden weariness we usually have to carry around.  That’s what got us into analytics in the first place, right?  People are confusing, full of irrational gray areas, but data is data, truth is truth.  When well-understood relationships make sense, it’s comforting.  When insights are found, it’s exciting.  Murder solved!  Puzzle completed!  And because it’s consumer behavior we are trying to predict–this helps us believe that maybe people are NOT so confusing.

So, look over your life.  Do you find enjoyment in black and white answers?  Do you naturally distrust any data / claims that you yourself have not been in to?  Do you like learning how things work, do you naturally and quickly see relationships (especially causal relationships) and are you constantly curious?  If the answers to these are mostly “Yes” then you might be analytic.


When I was in elementary school I was the class clown.  (Can you believe it?)  I have a strong introvert streak but also have always found it necessary to make the joke, point out the funny thing, and teachers usually hated me, the class clown.  I didn’t eat paste or do funny dances, it was always verbal.

Anyway, in third grade we were learning long division.  The previous couple of weeks the teacher had been warning us that LONG DIVISION was a very big deal, difficult, complicated, and would require all our attention, and she would have to mentor us along.  (She would be in no mood for class clowning when we started.)

So, the first day arrived and she motivated us to appreciate the central issue of long division, remainders, by asking, “Now, how can you divide 5 evenly?  You can’t.  Thus–”

I immediately shouted, “Yes, you can: two-and-a-half and two-and-a- half.  See, evenly.”

She sent me to the office.  The one time I was NOT being the class clown–I was being analytic–got me in trouble.  In truth, it served me right.  The statistics in that class proved that 9 out of 10 times whatever I said was worthy of sending me to the office.


So, another issue.  The real trouble is that being analytic in corporate America is not enough to be successful in analytics.  This is because most analytic folks are a little quiet, maybe introverted.  We can pretend it’s because of the left-brain domination where we get our sense of logic and rationality.  But to be successful in analytics you will have to be able to push yourself to find insights and present them to others.  You will often have to convince other people (sometimes those many levels above you, those that have the purse strings to carry the project forward).  So the key personality trait, as a test for analytic talent, is passionate curiosity.   That is, you are so excited by what you have found, you can easily overcome your natural shyness.  The love of discovery so drives you that you can';t keep your mouth shut and you tell the world that you have found the truth, and it’s shouted from the rooftops.

Therefore, I would say you are analytic if you love finding relationships in data.  But you can only be successful in analytics if  you are so thrilled by what you have found you must socialize that to everyone you can.  Right now.  Make sense?