Data Atrophy

Not all data are created equal. There are one-dimensional demographic and firmographic data, then there are more colorful behavioral data. The former is about how the targets look, and the latter is more about what they do, like what they click, browse, purchase and say.

However, is that really true? Do seemingly expired data really not have any value for future prediction? Yes, “real-time” data is one the most over-used terms in marketing, and yes, if marketers get to know someone is looking for some product that they sell, they need to react to such information right at that moment. In fact, most personalization efforts and related technologies are about such reactionary endeavors. In such cases, there is not much need for modeling or statistical work. If you know someone is looking for a mid-size luxury car, lead him to it now. If the customer is following a certain path on the journey map, usher her accordingly. But the “hot” data becomes history eventually, and historical data definitely has its place in the prediction business.

I learned from experience that the best predictor of future behavior is past behavior. People do not change that drastically. Most customers and prospects can be defined by what they have been doing. Those who shopped for luxury cars in the past will do that again sometime in the future. Folks who have been regularly buying golf equipment will do that again. Movie fans who prefer drama over action films won’t change their taste overnight. And statistical modeling will not only fill in the data gaps as I explained earlier, but also keep data that are getting colder lukewarm, at least for some time. There is a tremendous amount of value in elongating the usefulness of data, as so-called “hot” data only describes a small fraction of the target, only for those who left data trails in real-time. If marketers want to make a sizable improvement in their personalization efforts, following such a small target is simply not enough. They must expand the universe to the maximum.

Nevertheless, such modeling work – to fill in the data gaps or keep the data warm – becomes effective, only if we treat fresh and not-so-fresh information properly. That is why a simple time stamp must be converted into measurements of intervals between events. Instead of simply calling someone a golfer because his number of purchases in the golf equipment category to-date are higher than average, data players must add time elements to the data menu. Because he may not be a golfer any more. How many weeks have elapsed since the last purchase in the golf category? What are the average number of days between transactions there? What are the average number of weeks between new product releases and actual purchase of the item? In other words, does he buy the newest items at a premium price, or does he wait until the price comes down a bit? Or, is he a bargain seeker who only buys older models? How about such measurement by channel? If he doesn’t show up at stores any more, maybe he turned into an online buyer for consumable items, such as golf balls? None of these are what we would call hotline data, but they will be useful for targeting and personalization.

Yes, if the buyer is specifically looking for a new driver of certain brand, by any means, usher him there and close the sale. But let’s get ready for the next time he shows up (or for proactive promotion of other related items) by collecting, maintaining and transforming historical data, as well. Once these predictors (i.e., descriptive variables in models) are in line, statistical modeling can assign “scores” for specific behaviors that marketers care for. Along the line of this example, such model scores can be assigned for “cutting-edge buyers,” “bargain seekers,” “online buyers of repeat items,” “infrequent high-value customers,” “frequent small item buyers,” etc. No, these personas won’t necessarily be “hot,” but they are a valuable summary of all types of old and new data, and they will available for everyone all the time as they are just model scores (refer to “No One Is One-Dimensional”).

We are living through constant communication via multiple devices on any given day, but too many messages that consumers receive are utterly irrelevant and inadequate. Even the messages that are based on real-time information can turn into repeated nuisances. (How many times can you, as a consumer, bear to see a constant barrage of ads for an item that you clicked and viewed once a week ago?). And if there is no hotline data in play? We get to see the same generic messages through multiple channels.

That is why we must not treat data atrophy just as a problem, but embrace aging data as opportunities for the future. Today’s data become historical data in a blink, but we still have a lot to mine there. And such mining is possible, only if we arrange the data properly and let it age gracefully using statistical techniques. That is the way to personalize messages constantly for everyone, instead of reacting to real-time data only sporadically for a fraction of your audience.

Author: Stephen H. Yu

Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at

Leave a Reply

Your email address will not be published. Required fields are marked *