When we observe a certain phenomenon, we should never do so from just one angle. We’ve all heard the fable about blind men and an elephant, where each touched just one part of the animal and exclaimed, “Hey, this creature must be like a snake!” and “No, it feels like a thick column!” or “I’m sure it is like a big wall!” We certainly don’t want to fall into that trap.
In the world of marketing, however, so many jump to conclusions with limited information from one perspective. Further, some even fool themselves into thinking that they made scientific conclusions because they employed data mining techniques. Unfortunately, just quoting numbers does not automatically make anyone more analytical, as numbers live within contexts. With all these easy-to-use visualization tools, it’s equally easy to misrepresent the figures, as well.
When we try to predict the future – even the near future – things get even more complicated. It is hard enough to master the mathematical part of predictive analytics, but it gets harder when the data sources are seriously limited; or worse, skewed. When the data sources are contaminated with external factors other than consumer behavior, we may end up predicting the outcome based on the marketer’s action, not on consumer behaviors.
That is why procuring and employing multiple sources of data are so important in predictive analytics. Even when the mission is to just observe what is happening in the world, having multiple perspectives is essential. Simply, who would mind the bird’s-eye view when reporting a high-speed car chase on TV news? It certainly enhances the picture. On the other hand, you would not feel the urgency on the ground without the camera installed on a police car.
I frequently drive from New Jersey to New York City during rush hour. (I have my reasons.) I have been tracking the number of minutes in driving time between every major turn. Not that it helps much in reducing overall commuting time, as there isn’t much I can do when sitting helplessly on a bridge. But I can predict the arrival time with reasonable accuracy. Now armed with smartphone apps that collect such data from everyone with the same applications (crowd sourcing at its best), we can predict ETA to any destination with a margin of error narrower than a minute. That is great when I’m sitting in the car already. But do such analytics help me make decisions about whether I should have been in the car in the first place that morning? While it is great to have a navigator that tells me every turn that I should make, do all that data tell me if going to the city on the first day of school in September is the right decision? Hardly. I need a different perspective for that type of decision.
Every type of data and analytics has its place, and none are almighty. Marketers literally track every breath you take and every move you make when it comes to online activities. So-called “analytical solution providers” are making fortunes collecting data and analyzing them. Clickstream data are the major reasons why data got so big; and, thanks to them, we started using the term “Big Data.” It is very difficult to navigate through this complex world, so marketers spend a great amount of time and resources to figure out where they stand. Weekly reports that come out of such data are easily hundreds of pages (figuratively), and before marketers get to understand all those figures, a new set of reports lands on their laps (again, metaphorically). It is like having to look at the dashboard of a car without a break when driving it at full speed. Such a cycle continues, and the analysts get into a perpetual motion of pumping out reports.
I am not discounting the value of such reporting at all. When a rocket ship is being launched, literally hundreds of people look at their screens all simultaneously just to see how the process is going. However, if the rocket ship is in trouble, there isn’t much one can do by looking at the numbers other than, “Uh-oh, based on these figures, we have a serious engine problem right now.” And such reporting certainly does not tell anyone whether one should have launched the vehicle at that particular moment in time with that pre-set destination. Such analytics are completely different from analyzing every turn when moving at a full speed.
Marketers get lost because they look at the given sets of numbers looking for answers, while the metrics and reports are designed for some other purpose. At times, we need to change the perspective completely. For instance, looking at every click will not provide accurate sales projections on a personal or product level. Once in a while it may be correct, but such predictions can easily be thrown off with a slight jolt in the system. It gets worse when there is no direct correlation between clicks and conversions; as such things are heavily dependent upon business models and the site design (i.e., actions of marketers, not buyers).
As I emphasized numerous times in this series, analytical questions must be formed based on business questions, not the other way around. But too often, marketers seek to find answers to their questions within the limited data and reports they get to see. It is not impossible to gauge the speed of your vehicle based on the shape of the fur of your dog who is sticking his head out the window, but I wouldn’t recommend using that method when the goal is to estimate time of arrival with a margin of error of less than a minute.
Not all analytics are the same, and different types of analytical objectives call for different types of data, big and small. To understand your surroundings, yes, you need some serious business intelligence with carefully designed dashboards, real-time or otherwise. To predict the future outcome, or to fill in the blanks (as there are lots of unknown factors, even in the age of Big Data), we must change the perspective and harness different sets of data. To determine the overall destination, we need yet another types of analytics at a macro-level.
In the world of predictive analytics, predicting price elasticity, market trends or specific consumer behaviors all call for different types of data, techniques and specialists. Just within the realm of predicting consumer behavior, there lie different levels of difficulties. At the risk of sounding too simplistic, I would say predicting “who” is relatively easier than predicting “what product.” Predicting “when” is harder than those two things combined, as you may be able to predict “who” would be in the market for a “luxury vacation” with some confidence, but predicting “when” that person would actually purchase cruise ship tickets requires a different type of data, which is really hard to obtain with any consistency. The hardest one is predicting “why” people behave one way or the other. Let’s just say marketers need to listen to anyone who claims that they can do that with a grain of salt. We may need to get into a deep discussion regarding “causality” and “correlation” at that point.
Even that relatively simple “who” part of prediction calls for some debate, with all kinds of data being pumped out every second. Some marketers employ data and toolsets based on availability and price alone, but let us step back for a second and look at it from a different perspective.
Hypothetically speaking, let’s assume we as marketers get to choose one superpower to predict who is more likely to buy your product at a mall, so that you can address your prospects properly (i.e., by delivering personalized messages properly). Your choices are:
- You get to install a camera on everyone’s shoulder at the entrance of the mall
- You get to have everyone’s past transaction history on an SKU level (who, when, for how much and for what product)
The choice behind Door No. 1 offers what we generally call clickstream data, which falls into the realm of Big Data. It will record literally every move that everyone makes with a time stamp. The second choice is good old transaction data on a product level, and you may call it small data; though in this day and age, there is nothing so small about it. It is just relatively smaller in size in comparison to No. 1. Now, if your goal is to design the mall to optimize traffic patterns for sales, you surely need to pick No. 1. If your goal were to predict who is more likely to buy your product, I would definitely go with No. 2. Yes, some lady may be looking at shoes very frequently, but will she really make a purchase in that category? What does her personal transaction history say?
In reality, we may have to work just with No. 1, but if I had a choice in this hypothetical situation, I would opt for transaction data any time. In my co-op data business days, I looked through about 50 model documents per day for more than six years, and I have seen the predictive power of transaction data firsthand. If you can achieve accurate answers with smaller sets of data, why would you pick any reroute?
Of course in real life, I would like to have both. Because more varieties of data – not just these choices, but also demographic, geo-demographic, sentiment and attitudinal data, as well – will help you zoom into the target with greater accuracy, consistency and efficiency. In this example, if the potential customer is new to the mall, or has been dormant for a long time, you may have to work with just cameras-on-shoulders data. But such a judgment should be made during the course of analytics, and should not be predetermined by marketers or IT folks before the analysis begins.
Not all datasets are created equal, and we need all kinds of data. Each set of data comes with tons of holes in it, and we need to fill such gaps with data from other sources, from different angles. Too often, marketers get too deep into the rabbit hole simply because they have been digging it for a long time. But once in a while, we all need to stick our heads out of the hole and have a different perspective.
Digging a hole to a wrong direction will not make anyone richer, and you will never see the end of it while you’re in it.