Data Atrophy

Not all data are created equal. There are one-dimensional demographic and firmographic data, then there are more colorful behavioral data. The former is about how the targets look, and the latter is more about what they do, like what they click, browse, purchase and say.

Not all data are created equal. There are one-dimensional demographic and firmographic data, then there are more colorful behavioral data. The former is about how the targets look, and the latter is more about what they do, like what they click, browse, purchase and say. On top of these, if we are lucky, we may have access to attitudinal data, which are about what the target is thinking about. If we get to have all three types of data about the customers and prospects, prediction business will definitely get to the next level (refer to “Big Data Must Get Smaller”). But the reality is that it is very difficult to know everything about anyone, and that is why analytics is really about making the best of what we know. Predictive modeling is useful not only because it predicts the future, but also fills gaps in data. And even in the age of abundant data, there are many holes, as we will never have a complete set of information (refer to “Why Model?”).

Among these data types, some are more useful for prediction than others. Behavioral data definitely possess more predictive power than simple demographic data for sure. But alas, they are harder to come by. It could be that the target is new to the environment, so she may not have left much data behind at all. May be she just looked around and didn’t buy anything yet. Or she is very privacy-conscious and diligent about erasing her behavioral trails on the net or otherwise. Maybe she explicitly opted out of being traced at all, giving up much of the convenience factors of being known by the merchants. Then the data coverage comes into the equation, and that is why analysts rely on demographic and geo-demographic data for their readily available nature. Much of such data can easily be purchased and appended on a household or individual level, at least in the U.S. If we get to have some hint of identity of the target, there are ways to merge disparate data sets together.

What if we don’t get to know who are leaving data trails? Again, it could be about the privacy concerns of the target, or the manner by which the data are collected. Some data collectors avoid personally identifiable information, such as name, address or email, as they do not want to be seen as the Big Brother. Even if collectors get to have access to such PII, they do not share it with outsiders, to maintain dominance and to avoid the data privacy issue altogether. And there are many instances where that “who” part is completely out of reach. Movement data would be an example of that.

Weaving multiple types of data together is often the main source of trouble when it comes to predictive analytics. I have been talking about the importance of a 360-degree view of a customer for proper personalization and attribution, but the main show-stopper there is often the inability to merge data sources with confidence, not the lack of technology or statistical skills. That would be the horizontal challenge when dealing with multiple types of data.

Then there is the time factor. Like living organisms, data get old and wither away, too. Let’s call it the “data atrophy” challenge. Data players must be mindful about it, as outdated information is often worse than not having any at all for the decision-making or prediction business.

Now, not all data types deteriorate at the same rate. The shelf-life of demographic data are far longer than that of behavioral data. For example, people’s income levels or housing size do not change overnight, while usefulness of what we call “hotline” data evaporates much faster. If you get to know that someone is searching for a new car, how long will he be in the market? What if it is about a ticket or pay-per-view purchase for tonight’s ball game? Data that is extremely valuable this minute could be totally irrelevant within the next hour.

Author: Stephen H. Yu

Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at stephen.yu@willowdatastrategy.com.

Leave a Reply

Your email address will not be published. Required fields are marked *