Why Modeling Beats Rule-Based Segmentation

I have been talking about “employing all available data” for targeting and customer insights for some time now. So allow me to pick a different bone today. Let’s forget the data part, and talk about the methodology. When machines can build models super-fast, aversion to modeling only limits the users. After all, I am not asking any marketers to get a degree in statistics. I am just asking them to consider modeling techniques.


I cringe when I hear “rule-based” segments are sitting on top of so-called state-of-the-art campaign engines. This is year 2018 A.D. It’s the age of abundant data with an ample number of tools and options to harness their true powers. And marketers are still making up rules now? It’s time for marketers to embrace modeling.

I wonder what most of the rules marketers are using are made of. Recency? Certainly, but how recent is recent enough?

Frequency? Sure, why not? The more the merrier, right? But in what timeframe? Are you counting transactions, orders or items? Or just some “events”?

Monetary? Hmm, that’s tricky. Are we using an individual-level lifetime total amount, value of the last transaction, average spending per transaction, average spending amount per year, or what? Don’t tell me you don’t even have individual-level summary data. No customer is just a reflection of her last transaction.

Actually, if a company is using some RFM (Recency, Frequency, Monetary Value) data for targeting, that is not so bad. At least it’s taking a look at what actually happened in terms of monetary transactions, not just clicks and page views, along with basic demographic data.

I have been talking about “employing all available data” for targeting and customer insights for some time now. So allow me to pick a different bone today. Let’s forget the data part, and talk about the methodology. When machines can build models super-fast, aversion to modeling only limits the users. After all, I am not asking any marketers to get a degree in statistics. I am just asking them to consider modeling techniques, as this data industry has moved forward from the days when some basic RFM rule sets used to get a passing grade.

Let’s look at the specific reasons why marketers should consider modeling techniques more seriously and ditch rule-based segmentation.

Reason No. 1: Variable Selection

We are surrounded by data, as every move that anyone makes is digitized now. When you describe a buyer, you may need to evaluate hundreds, if not thousands, of data points. Even if you are just using simple set of demographic data without any behavioral data, we are talking about over 100 variables to consider out of the gate.

Let’s say you want to build a rule to find a good segment for the sale of luxury cruises. How would you pick the most predictable variable for that one purpose? Income and age? That is not a bad start, but that is like using just two colors out of a crayon box containing 80 colors.

Case in point: Do you really believe that the main difference between luxury cruisers and luxury car buyers is “income”? Guess what, those buyers are all rich. You must dig much deeper than that.

Marketers often choose variables that they can easily understand and visualize. Unfortunately, the goal of the targeting exercise should be effectiveness of targeting, not easy comprehension by the marketer.

We often find obscure variables in models. They may “seem” obscure, as a human being would never have instinctively picked them. But mathematics doesn’t care for our opinions. In modeling, variables are picked for their predictive power, nothing else. The bonus is that this is exactly how new patterns are discovered.

We hear tidbits such as “People who tend to watch more romantic comedies are more likely to rent cars over the weekend,” “Aggressive investors are less likely to visit family restaurants” or “High-value customers for a certain teenage apparel company are more likely to be seasonal buyers with high item counts per customer, but relatively lower transaction counts.”

These are the contributing factors found through vigorous mathematical exercises, not someone’s imagination or intuition. But they should always make sense in the end (unless of course, there were errors). Picking the right predictor is indeed the most important step in modeling.

Reason No. 2: Weight Factor

Let’s say that by chance, a user stumbled upon a set of useful predictors of certain customer behavior. Let’s go back to the last example of the teenage apparel company’s high-value customer model. In that one sentence, I listed: seasonality (expressed in number of transactions by month, regardless of year), number of item counts per customer (with time limits, such as past 36 months), and number of transactions per customer.

In real life, there would be a far greater number of variables that would pass the initial variable selection process. But for simplicity’s sake, let’s just review these three variables.

Now tell me, which one of these three variables is the most important predictor of this high-value customer model? (Please don’t say they are all equally important.) Model scores are made of selected variables multiplied by the weight of each, as not all predictors carry the same level of predictability. Some may even be “negatively” correlated to the ideal behavior that we are going after. In this example alone, we saw that the number of items was positively related to the high value, while the number of transactions are negatively related. When investigating further about this “strange” correlation, we found out that most of the high-value customers are trained by the marketer to wait for a big sale, and then buy lots of items in one transaction.

The main trouble with the “rule-based” segmentation or targeting exercise is that human beings put arbitrary weight (or importance) on each variable, even if “the right” variables were picked — mostly by chance — in the first place.

The modeling process reveals the actual balance among all important predictors, with the sole purpose of maximizing predictability. Conversely, I have never met a person who can “imagine” the dynamics of two or three variables, let alone 10 to 20 (the typical number of variables in models).

Forget about the recent emergence of machine learning; with or without human statisticians, modeling techniques have been beating rudimentary rules by end-users for decades. If solely left to humans, the No. 1 predictor of any human behavior would be the income of the target. But that is just a reflection of human perception and a one-dimensional way of looking at a complex composition of human behavior. You don’t believe you can explain the difference between a Lexus buyer and a Mercedes buyer with just income, do you?

Reason No. 3: Banding

Much of data are composed of numbers and figures. The rest of them are called categorical variables (i.e., data that cannot be added or subtracted, such as product category or channel description).

Let’s assume that income — not my first pick, as you can see — is found to be predictable for mid- to high-scale female accessory buyers. Surely, different ranges of income would behave differently in such models. If the income is too low, they won’t be able to afford such items. Too high, then the buyer may have moved on to even more expensive handbags. So, the middle ground may seem to be the ideal target. The trouble is that now you have to describe that middle group in terms of actual dollars. Exactly where does that ideal range begin and end? To make it even more complicated, what about regional biases in buying power? Can one set of banding explain the whole thing? We’ve gone way past any intuitive grouping.

Moving onto categorical variables, one of the most predictable variables in any B2B modeling is the SIC code. There are thousands of variations in any one field, and they are definitely not numbers (although they look like them). How would one go about putting them into ideal groups to describe the target (such as “loyal customers”)?

If you are selling expensive computer servers, one may put “Agricultural, Fishing and Mining” as a low priority group. Then, how about all those variations in huge groups, such as “Retail,” “Business Service” or “Manufacturing,” with hundreds of sub-categories? Let’s just say that I’ve never met a human being who went beyond the initial two-digit SIC code in their heads. Good luck creating an effective group with that one variable with rudimentary methods.

Grouping “values” that move together in terms of predictability is not simple. In fact, that is exactly why computers were invented. Don’t struggle with such jobs.

These are just a few reasons why we must rely on advanced modeling techniques to navigate through complex data. The benefits of modeling are plenty (refer to “Why Model?”). Compared to our gut feelings, statistical models are much more accurate and consistent. They also reveal previously unseen patterns in data. Because they are summarized answers to specific questions, users do not have to consider hundreds of factors, but just one model score at a time. In the current marketing environment, when things move at a light speed, who has time to consider hundreds of data points in real-time? Machine learning — leading to full automation — is just a natural extension of modeling.

Each model score is a summary of hundreds of contributing factors. “Responsiveness to email campaigns for a European cruise vacation” is a complex question to answer, especially when we all go through daily data overload. But if the answer is in the form of a simple score (say, one through 10), any user who understands “high is good, low is bad” can make a sound decision at the time of campaign execution.

Marketers already have ample amounts of data and advanced campaign tools. Running such machines with some man-made segmentation rules from the last century is a real shame. No one is asking marketers to become seasoned data scientists; they just need to be more open to advanced techniques. With firm commitments, we can always hire experts, or in the near future, machines that will do the mathematical jobs for us. But marketers must move out of old fashioned rule-based marketing first.

Author: Stephen H. Yu

Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at stephen.yu@willowdatastrategy.com.

4 thoughts on “Why Modeling Beats Rule-Based Segmentation”

  1. Thanks for the article, Mr. Yu. I think your conclusion, “any user who understands ‘high is good, low is bad’ can make a sound decision at the time of campaign execution”, is spot on.

    And yet I’m also reminded of an adage from the insurance industry, where modeling of customer behavior has been applied for decades: “All models are wrong. Some models are useful.” It’s a reminder that the point of a model is not description, but action. What is the model telling us about the future, rather than the past?

    Any good data scientist or database analyst would be judged highly competent in developing an analysis of recent behavior that reduced model error to virtually zero. But that explanation of the past is worthless if it doesn’t make a verifiable (or falsifiable) prediction. The real value of a model is applying history to future behavior of the target audience. Whether your prospect list has an average score of 2 or 8 should be a strong leading indicator of failure or success.

    1. Stephan, thank you for your support. Yes, modeling works because future behavior is a reflection of past behavior. To maximize the predictive power, we must set the target up to reflect our business goals. Otherwise, even the most mathematically elegant model will not serve the true purpose. That is why setting the right target is “the” most important – and difficult – task in modeling. And such exercise is not a mathematical function, but business function.

      When people complain that models didn’t work, we often find out that everything except the modeling algorithm was wrong, starting with inadequate target definition, inappropriate and unrefined data, ended with erroneous deployment and usage of models.

      Unfortunately, schools do not provide proper training for data manipulation and target definition. It is a common joke that the last time we all saw a perfect set of dataset and pre-determined target definition was at school! In real life, there is no such thing.

Leave a Reply

Your email address will not be published. Required fields are marked *