Marketers Find the Least-Wrong Answers Via Modeling

Why do marketers still build models when we have ample amounts of data everywhere? Because we will never have every piece of data about everything. We just don’t know what we don’t know.

Why do marketers still build models when we have ample amounts of data everywhere? Because we will never have every piece of data about everything. We just don’t know what we don’t know.

Okay, then — we don’t get to know about everything, but what are the data that we possess telling us?

We build models to answer that question. Even scientists who wonder about the mysteries of the universe and multiverses use models for their research.

I have been emphasizing the importance of modeling in marketing through this column for a long time. If I may briefly summarize a few benefits here:

  • Models Fill in the Gaps, covering those annoying “unknowns.” We may not know for sure if someone has an affinity for luxury gift items, but we can say that “Yes, with data that we have, she is very likely to have such an affinity.” With a little help from the models, the “unknowns” turn into “potentials.”
  • Models Summarize Complex Data into simple-to-use “scores.” No one has time to dissect hundreds of data variables every time we make a decision. Model scores provide simple answers, such as “Someone likely to be a bargain-seeker.” Such a model may include 10 to 20 variables, but the users don’t need to worry about those details at the time of decision-making. Just find suitable offers for the targets, based on affinities and personas (which are just forms of models).
  • Models are Far More Accurate Than Human Intuition. Even smart people can’t imagine interactions among just two or three variables in their heads. Complex multivariate interaction detection is a job for a computer.
  • Models Provide Consistent Results. Human decision-makers may get lucky once in a while, but it will be hard to keep it up with machines. Mathematics do not fluctuate too much in terms of performance, provided with consistent and accurate data feeds.
  • Models Reveal Hidden Patterns in data. When faced with hundreds of data variables, humans often resort to what they are accustomed to (often fewer than four to five factors). Machines indiscriminately find new patterns, relentlessly looking for the best suitable answers.
  • Models Help Expand the Targeting Universe. If you want a broader target, just go after slightly lower score targets. You can even measure the risk factors while in such an expansion mode. That is not possible with some man-made rules.
  • When Done Right, Models Save Time and Effort. Marketing automation gets simpler, too, as even machines can tell high and low scores apart easily. But the keywords here are “when done right.”

There are many benefits of modeling, even in the age of abundant data. The goal of any data application is to help in the decision-making process, not aid in hoarding the data and bragging about it. Do you want to get to the accurate, consistent, and simple answers — fast? Don’t fight against modeling, embrace it. Try it. And if it doesn’t work, try it in another way, as the worst model often beats man-made rules, easily.

But this time, I’m not writing this article just to promote the benefits of modeling again. Assuming that you embrace the idea already, let’s now talk about the limitations of it. With any technique, users must be fully aware of the downsides of it.

It Mimics Existing Patterns

By definition, models identify and mimic the patterns in the existing data. That means, if the environment changes drastically, all models built in the old world will be rendered useless.

For example, if there are significant changes in the supply chain in a retail business, product affinity models built for old lines of products won’t work anymore (even if products may look similar). More globally, if there were major disruptions, such as a market crash or proliferation of new technologies, none of the old assumptions would continue to be applicable.

The famous economics phrase Ceteris paribus — all other things being equal — governs conventional modeling. If you want your models to be far more adaptive, then consider total automation of modeling through machine learning. But I still suggest trying a few test models in an old-fashioned way, before getting into a full automation mode.

If the Target Is Off, Everything Is Off

If the target mark is hung on a wrong spot, no sharpshooter will be able to hit the real target. A missile without a proper guidance system is worse than not having one at all. Setting the right target for a model is the most critical and difficult part in the whole process, requiring not only technical knowledge, but also deep understanding of the business at stake, the nature of available data, and the deployment mechanism at the application stage.

This is why modeling is often called “half science, half art.” A model is only as accurate as the target definition of the model. (For further details on this complex subject, refer to “Art of Targeting”).

The Model Is Only as Good as the Input Data

No model can be saved if there are serious errors or inconsistencies in the data. It is not just about bluntly wrong data. If the nature of the data is not consistent between the model development sample and the practical pool of data (where the model will be applied and used), the model in question will be useless.

This is why the “Analytics Sandbox” is important. Such a sandbox environment is essential — not just for simplification of model development, but also for consistent application of models. Most mishaps happen before or after the model development stage, mostly due to data inconsistencies in terms of shapes and forms, and less due to sheer data errors (not that erroneous data is acceptable).

The consistency factor matters a lot: If some data variables are “consistently” off, they may still possess some predictive power. I would even go as far as stating that consistency matters more than sheer accuracy.

Accuracy Is a Relative Term

Users often forget this important fact, but model scores aren’t pinpoint accurate all of the time. Some models are sharper than others, too.

A model score is just the best estimate with the existing data. In other words, we should take model scores as the least-wrong answers in a given situation.

So, when I say it is accurate, I mean to say a model is more accurate than human intuition based on a few basic data points.

Therefore, the user must always consider the risk of being wrong. Now, being wrong about “Who is more likely to respond to this 15% discount offer?” is a lot less grave than being wrong about “Who is more likely to be diabetic?”

In fact, if I personally face such a situation, I won’t even recommend building the latter model, as the cost of being wrong is simply too high. (People are very sensitive about their medical information.) Some things should not just be estimated.

Even with innocuous models, such as product affinities and user propensities, users should never treat them as facts. Don’t act like you “know” the target, simply because some model scores are available to you. Always approach your target with a gentle nudge; as in, “I don’t know for sure if you would be interested in our new line of skin care products, but would you want to hear more about it?” Such gentle approaches always sound friendlier than acting like you “know” something about them for sure. That seems just rude on the receiving end, and recipients of blunt messages may even think that you are indeed creepy.

Users sometimes make bold moves with an illusion that data and analytics always provide the right answers. Maybe the worst fallacy in the modern age is the belief that anything a computer spits out is always correct.

Users Abuse Models

Last month, I shared seven ways users abuse models and ruin the results (refer to “Don’t Ruin Good Models by Abusing Them”). As an evangelist of modeling techniques, I always try to prevent abuse cases, but they still happen in the application stages. All good intentions of models go out the window if they are used for the wrong reasons or in the wrong settings.

I am not at all saying that anyone should back out of using models in their marketing practices for the shortfalls that I listed here. Nonetheless, to be consistently successful, users must be aware of limitations of models, as well. Especially if you are about to go on full marketing automation. With improper application of models, you may end up automating bad or wrong practices really fast. For the sake of customers on the receiving end — not just for the safety of your position in the marketing industry — please be more careful with this sharp-edged tool called modeling.

Sex and the Schoolboy: Predictive Modeling – Who’s Doing It? Who’s Doing it Right?

Forgive the borrowed interest, but predictive modeling is to marketers as sex is to schoolboys. They’re all talking about it, but few are doing it. And among those who are, fewer are doing it right. In customer relationship marketing (CRM), predictive modeling uses data to predict the likelihood of a customer taking a specific action. It’s a three-step process.

Forgive the borrowed interest, but predictive modeling is to marketers as sex is to schoolboys.

They’re all talking about it, but few are doing it. And among those who are, fewer are doing it right.

In customer relationship marketing (CRM), predictive modeling uses data to predict the likelihood of a customer taking a specific action. It’s a three-step process:

1. Examine the characteristics of the customers who took a desired action

2. Compare them against the characteristics of customers who didn’t take that action

3. Determine which characteristics are most predictive of the customer taking the action and the value or degree to which each variable is predictive

Predictive modeling is useful in allocating CRM resources efficiently. If a model predicts that certain customers are less likely respond to a specific offer, then fewer resources can be allocated to those customers, allowing more resources to be allocated to those who are more likely to respond.

Data Inputs
A predictive model will only be as good as the input data that’s used in the modeling process. You need the data that define the dependent variable; that is, the outcome the model is trying to predict (such as response to a particular offer). You’ll also need the data that define the independent variables, or the characteristics that will be predictive of the desired outcome (such as age, income, purchase history, etc.). Attitudinal and behavioral data may also be predictive, such as an expressed interest in weight loss, fitness, healthy eating, etc.

The more variables that are fed into the model at the beginning, the more likely the modeling process will identify relevant predictors. Modeling is an iterative process, and those variables that are not at all predictive will fall out in the early iterations, leaving those that are most predictive for more precise analysis in later iterations. The danger in not having enough independent variables to model is that the resultant model will only explain a portion of the desired outcome.

For example, a predictive model created to determine the factors affecting physician prescribing of a particular brand was inconclusive, because there weren’t enough dependent variables to explain the outcome fully. In a standard regression analysis, the number of RXs written in a specific timeframe was set as the dependent variable. There were only three independent variables available: sales calls, physician samples and direct mail promotions to physicians. And while each of the three variables turned out to have a positive effect on prescriptions written, the “Multiple R” value of the regression equation was high at 0.44, meaning that these variables only explained 44 percent of the variance in RXs. The other 56 percent of the variance is from factors that were not included in the model input.

Sample Size
Larger samples will produce more robust models than smaller ones. Some modelers recommend a minimum data set of 10,000 records, 500 of those with the desired outcome. Others report acceptable results with as few as 100 records with the desired outcome. But in general, size matters.

Regardless, it is important to hold out a validation sample from the modeling process. That allows the model to be applied to the hold-out sample to validate its ability to predict the desired outcome.

Important First Steps

1. Define Your Outcome. What do you want the model to do for your business? Predict likelihood to opt-in? Predict likelihood to respond to a particular offer? Your objective will drive the data set that you need to define the dependent variable. For example, if you’re looking to predict likelihood to respond to a particular offer, you’ll need to have prospects who responded and prospects who didn’t in order to discriminate between them.

2. Gather the Data to Model. This requires tapping into several data sources, including your CRM database, as well as external sources where you can get data appended (see below).

3. Set the Timeframe. Determine the time period for the data you will analyze. For example, if you’re looking to model likelihood to respond, the start and end points for the data should be far enough in the past that you have a sufficient sample of responders and non-responders.

4. Examine Variables Individually. Some variables will not be correlated with the outcome, and these can be eliminated prior to building the model.

Data Sources
Independent variable data
may include

  • In-house database fields
  • Data overlays (demographics, HH income, lifestyle interests, presence of children,
    marital status, etc.) from a data provider such as Experian, Epsilon or Acxiom.

Don’t Try This at Home
While you can do regression analysis in Microsoft Excel, if you’re going to invest a lot of promotion budget in the outcome, you should definitely leave the number crunching to the professionals. Expert modelers know how to analyze modeling results and make adjustments where necessary.

Cheat Sheet: Is Your Database Marketing Ready?

Many data-related projects end up as big disappointments. And, in many cases, it is because they did not have any design philosophy behind them. Because many folks are more familiar with buildings and cars than geeky databases, allow me to use them as examples here.

Many data-related projects end up as big disappointments. And, in many cases, it is because they did not have any design philosophy behind them. Because many folks are more familiar with buildings and cars than geeky databases, allow me to use them as examples here.

Imagine someone started constructing a building without a clear purpose. What is it going to be? An office building or a residence? If residential, for how many people? For a family, or for 200 college kids? Are they going to just eat and sleep in there, or are they going to engage in other activities in it? What is the budget for development and ongoing maintenance?

If someone starts building a house without answering these basic questions, well, it is safe to say that the guy who commissioned such a project is not in the right state of mind. Then again, he may be a filthy rich rock star with some crazy ideas. But let us just say that is an exceptional case. Nonetheless, surprisingly, a great many database projects start out exactly this way.

Just like a house is not just a sum of bricks, mortar and metal, a database is not just a sum of data, and there has to be design philosophy behind it. And yet, many companies think that putting all available data in one place is just good enough. Call it a movie without a director or a building without an architect; you know and I know that such a project cannot end well.

Even when a professional database designer gets involved, too often the project goes out of control—as the business requirement document ends up being a summary of
everyone’s wish lists, without any prioritization or filtering. It is a case of a movie without a director. The goal becomes something like “a database that stores all conceivable marketing, accounting and payment activities, handling both prospecting and customer relationship management through all conceivable channels, including face-to-face sales and lead management for big accounts. And it should include both domestic and international activities, and the update has to be done in real time.”

Really. Someone in that organization must have attended a database marketing conference recently to get all that listed. It might be simpler and cheaper building a 2-ton truck that flies. But before we commission something like this from the get-go, shall we discuss why the truck has to fly, too? For one, if you want real-time updates, do you have a business case for it? (As in, someone in the field must make real-time decisions with real-time data.) Or do you just fancy a large object, moving really fast?

Companies that primarily sell database tools often do not help the matter, either. Some promise that the tool sets will categorize all kinds of input data, based on some auto-generated meta-tables. (Really?) The tool will clean the data automatically. (Is it a self-cleaning oven?) The tool will establish key links (by what?), build models on its own (with what target data?), deploy campaigns (every Monday?), and conduct result analysis (with responses from all channels?).

All these capabilities sound really wonderful, but does that system set long- and short-term marketing goals for you, too? Does it understand the subtle nuances in human behaviors and intentions?

Sorry for being a skeptic here. But in such cases, I think someone watched “Star Trek” too much. I have never seen a company that does not regret spending seven figures on a tool set that was supposed to do everything. Do you wonder why? It is not because such activities cannot be automated, but because:

  1. Machines do not think for us (not quite yet); and
  2. Such a system is often very expensive, as it needs to cover all contingencies (the opposite of “goal-oriented” cheaper options).

So it becomes nearly impossible to justify the cost with incremental improvements in marketing efficiency. Even if the response rates double, all related marketing costs go down by a quarter, and revenue jumps up by 200 percent, there are not many companies that can easily justify that kind of spending.

Worse yet, imagine that you just paid 10 times more for some factory-made suit than you would have paid for a custom-made Italian suit. Since when is an automated, cookie-cutter answer more desirable than custom-tailored ones? Ever since computing and storage costs started to go down significantly, and more so in this age of Big Data that has an “everything, all the time” mentality.

But let me ask you again: Do you really have a marketing database?

Let us just say that I am a car designer. A potential customer who has been doing a lot of research on the technology front presents me with a spec for a vehicle that is as big as a tractor-trailer and as quick as a passenger car. I guess that someone really needs to move lots of stuff, really fast. Now, let us assume that it will cost about $8 million or more to build a car like that, and that estimate is without the rocket booster (ah, my heart breaks). If my business model is to take a percentage out of that budget, I would say, “Yeah sure, we can build a car like that for you. When can we start?”

But let us stop for a moment and ask why the client would “need” (not “want”) a car like that in the first place. After some user interviews and prioritization, we may collectively conclude that a fleet of full-size vans can satisfy 98 percent of the business needs, saving about $7 million. If that client absolutely and positively has to get to that extra 2 percent to satisfy every possible contingency in his business and spend that money, well, that is his prerogative, is it not? But I have to ask the business questions first before initiating that inevitable long and winding journey without a roadmap.

Knowing exactly what the database is supposed to be doing must be the starting point. Not “let’s just gather everything in one place and hope to God that some user will figure something out eventually.” Also, let’s not forget that constantly adding new goals in any phase of the project will inevitably complicate the matter and increase the cost.

Conversely, repurposing a database designed for some other goal will cause lots of troubles down the line. Yeah, sure. Is it not possible to move 100 people from A to B with a 2-seater sports car, if you are willing to make lots of quick trips and get some speeding tickets along the way? Yes, but that would not be my first recommendation. Instead, here are some real possibilities.

Databases support many different types of activities. So let us name a few:

  • Order fulfillment
  • Inventory management and accounting
  • Contact management for sales
  • Dashboard and report generation
  • Queries and selections
  • Campaign management
  • Response analysis
  • Trend analysis
  • Predictive modeling and scoring
  • Etc., etc.

The list goes on, and some of the databases may be doing fine jobs in many areas already. But can we safely call them “marketing” databases? Or are marketers simply tapping into the central data depository somehow, just making do with lots of blood, sweat and tears?

As an exercise, let me ask a few questions to see if your organization has a functioning marketing database for CRM purposes:

  • What is the average order size per year for customers with tenure of more than one year? —You may have all the transaction data, but maybe not on an individual level in order to know the average.
  • What is the number of active and dormant customers based on the last transaction date? —You will be surprised to find out that many companies do not know exactly how many customers they really have. Beep! 1 million-“ish” is not a good answer.
  • What is the average number of days between activities for each channel for each customer? —With basic transaction data summarized “properly,” this is not a difficult question to answer. But it’s very difficult if there are divisional “channel-centric” databases scattered all over.
  • What is the average number of touches through all channels that you employ before your customer reaches the projected value potential? —This is a hard one. Without all the transaction and contact history by all channels in a “closed-loop” structure, one cannot even begin to formulate an answer for this one. And the “value potential” is a result of statistical modeling, is it not?
  • What are typical gateway products, and how are they correlated to other product purchases? —This may sound like a product question, but without knowing each customer’s purchase history lined up properly with fully standardized product categories, it may take a while to figure this one out.
  • Are basic RFM data—such as dollars, transactions, dates and intervals—routinely being used in predictive models? —The answer is a firm “no,” if the statisticians are spending the majority of their time fixing the data; and “not even close,” if you are still just using RFM data for rudimentary filtering.

Now, if your answer is “Well, with some data summarization and inner/outer joins here and there—though we don’t have all transaction records from last year, and if we can get all the campaign histories from all seven vendors who managed our marketing campaigns, except for emails—maybe?”, then I am sorry to inform you that you do not have a marketing database. Even if you can eventually get to the answer if some programmer takes two weeks to draw a 7-page flow chart.

Often, I get extra comments like “But we have a relational database!” Or, “We stored every transaction for the past 10 years in Hadoop and we can retrieve any one of them in less than a second!” To these comments, I would say “Congratulations, your car has four wheels, right?”

To answer the important marketing questions, the database should be organized in a “buyer-centric” format. Going back to the database philosophy question, the fundamental design of the database changes based on its main purpose, much like the way a sports sedan and an SUV that share the same wheel base and engine end up shaped differently.

Marketing is about people. And, at the center of the marketing database, there have to be people. Every data element in the base should be “describing” those people.

Unfortunately, most relational databases are transaction-, channel- or product-centric, describing events and transactions—but not the people. Unstructured databases that are tuned primarily for massive storage and rapid retrieval may just have pieces of data all over the place, necessitating serious rearrangement to answer some of the most basic business questions.

So, the question still stands. Is your database marketing ready? Because if it is, you would have taken no time to answer my questions listed above and say: “Yeah, I got this. Anything else?”

Now, imagine the difference between marketers who get to the answers with a few clicks vs. the ones who have no clue where to begin, even when sitting on mounds of data. The difference between the two is not the size of the investment, but the design philosophy.

I just hope that you did not buy a sports car when you needed a truck.