How Marketers Can Throw Away Data, Without Regrets

Yes, data is an asset. But not if the data doesn’t generate any value. (There is no sentimental value to data, unless we are talking about building a museum of old data.) So here’s how to throw away data.

Last month, I talked about data hoarders (refer to “Don’t Be a Data Hoarder”). This time, let me share some ideas about how to throw away data.

I heard about people who specialize in cleaning other people’s closets and storage spaces. Looking at the result — turning a hoarder’s house into a presentable living quarters — I am certain that they have their own set of rules and methodologies in deciding what to throw out, what goes together, and how to organize items that are to be kept.

I recently had a relatable experience, as I sold a house and moved to a smaller place, all in the name of age-appropriate downsizing. We lived in the old home for 22 years, raising two children. We thought that our kids took much of their stuff when they moved out, but as you may have guessed already, no, we still had so much to sort through. After all, we are talking about accumulation of possessions by four individuals for 22 long years. Enough to invoke a philosophical question “Why do humans gather so much stuff during their short lifespans?” Maybe we all carry a bit of hoarder genes after all. Or we’re just too lazy to sort things through on a regular basis.

My rule was rather simple: If I haven’t touched an item for more than three years (two years for apparel), give it away or throw it out. One exception was for the things with high sentimental value; which, unfortunately, could lead into hoarding behavior all over again (as in “Oh, I can’t possibly throw out this ‘Best Daddy in the World’ mug, though it looks totally hideous.”). So, when I was in doubt, I chucked it.

But after all of this, I may have to move to an even smaller place to be able to claim a minimalist lifestyle. Or should I just hire a cleanup specialist? One thing is for sure though; the cleanup job should be done in phases.

Useless junk — i.e., things that generate no monetary or sentimental value — is a liability. Yes, data is an asset. But not if the data doesn’t generate any value. (There is no sentimental value to data, unless we are talking about building a museum of old data.)

So, how do we really clean the house? I’ve seen some harsh methods like “If the data is more than three years old, just dump it.” Unless the business model has gone through some drastic changes rendering the past data completely useless, I strongly recommend against such a crude tactic. If trend analysis or a churn prediction model is in the plan, you will definitely regret throwing away data just because they are old. Then again, as I wrote last month, no one should keep every piece of data since the beginning of time, either.

Like any other data-related activities, the cleanup job starts with goal-setting, too. How will you know what to keep, if you don’t even know what you are about to do? If you “do” know what is on the horizon, then follow your own plan. If you don’t, the No. 1 step would be a companywide Need-Analysis, as different types of data are required for different tasks.

The Process of Ridding Yourself of Data

First, ask the users and analysts:

  • What is in the marketing plan?
  • What type of predictions would be required for such marketing goals? Be as specific as possible:
    • Forecasting and Time-Series Analysis — You will need to keep some “old” data for sure for these.
    • Product Affinity Models for Cross-sell/Upsell — You must keep who bought what for how much, when, through what channel type of data.
    • Attribution Analysis and Response Models — This type of analytics requires past promotion and response history data for at least a few calendar years.
    • Product Development and Planning — You would need SKU-level transaction data, but not from the beginning of time.
    • Etc.
  • What do you have? Do the full inventory and categorize them by data types, as you may have much more than you thought. Some examples are:
    • PII (Personally Identifiable Data): Name, Address, Email, Phone Number, Various ID’s, etc. These are valuable connectors to other data sources such as Geo/Demographic Data.
    • Order/Transaction Data: Transaction Date, Amount, Payment Methods
    • Item/SKU-Level Data: Products, Price, Units
    • Promotion/Response History: Source, Channel, Offer, Creative, Drop/Wave, etc.
    • Life-to-Date/Past ‘X’ Months Summary Data: Not as good as detailed, event-level data, but summary data may be enough for trend analysis or forecasting.
    • Customer Status Flags: Active, Dormant, Delinquent, Canceled
    • Surveys/Product Registration: Attitudinal and Lifestyle Data
    • Customer Communication History Data: Call-center and web interaction data
    • Online Behavior: Open, Click-through, Page views, etc.
    • Social Media: Sentiment/Intentions
    • Etc.
  • What kind of data did you buy? Surprisingly large amounts of data are acquired from third-party data sources, and kept around indefinitely.
  • Where are they? On what platform, and how are they stored?
  • Who is assessing them? Through what channels and platform? Via what methods or software? Search for them, as you may uncover data users in unexpected places. You do not want to throw things out without asking them.
  • Who is updating them? Data that are not regularly updated are most likely to be junk.

Taking Stock

Now, I’m not suggesting actually “deleting” data on a source level in the age of cheap storage. All I am saying is that not all data points are equally important, and some data can be easily tucked away. In short, if data don’t fit your goals, don’t bring them out to the front.

Essentially, this is the first step of the data refinement process. The emergence of the Data Lake concept is rooted here. Big Data was too big, so users wanted to put more useful data in more easily accessible places. Now, the trouble with the Data Lake is that the lake water is still not drinkable, requiring further refinement. However, like I admitted that I may have to move again to clean my stuff out further, the cleaning process should be done in phases, and the Data Lake may as well be the first station.

In contrast, the Analytics Sandbox that I often discussed in this series would be more of a data haven for analysts, where every variable is cleaned, standardized, categorized, consolidated, and summarized for advanced analytics and targeting (refer to “Chicken or the Egg? Data or Analytics?” and “It’s All about Ranking”). Basically, it’s data on silver platters for professional analysts— humans or machines.

At the end of such data refinement processes, the end-users will see data in the form of “answers to questions.” As in, scores that describe targets in a concise manner, like “Likelihood of being an early adopter,” or “Likelihood of being a bargain-seeker.” To get to that stage, useful data must flow through the pipeline constantly and smoothly. But not all data are required to do that (refer to “Data Must Flow, But Not All of Them”).

For the folks who just want to cut to the chase, allow me to share a cheat sheet.

Disclaimer: You should really plan to do some serious need analysis to select and purge data from your value chain. Nonetheless, you may be able to kick-start a majority of customer-related analytics, if you start with this basic list.

Because different business models call for a different data menu, I divided the list by major industry types. If your industry is not listed here, use your imagination along with a need-analysis.

Cheat Sheet

Merchandizing: Most retailers would fall into this category. Basically, you would provide products and services upon payment.

  • Who: Customer ID / PII
  • What: Product SKU / Category
  • When: Purchase Date
  • How Much: Total Paid, Net Price, Discount/Coupon, Tax, Shipping, Return
  • Channel/Device: Store, Web, App, etc.
  • Payment Method

Subscription: This business model is coming back with full force, as a new generation of shoppers prefer subscription over ownership. It gets a little more complicated, as shipment/delivery and payment may follow different cycles.

  • Who: Subscriber ID/PII
  • Brand/Title/Property
  • Dates: First Subscription, Renewal, Payment, Delinquent, Cancelation, Reactivation, etc.
  • Paid Amounts by Pay Period
  • Number of Payments/Turns
  • Payment Method
  • Auto Payment Status
  • Subscription Status
  • Number of Renewals
  • Subscription Terms
  • Acquisition Channel/Device
  • Acquisition Source

Hospitality: Most hotels and travel services fall under this category. This is even more complicated than the subscription model, as booking and travel date, and gaps between them, all play important parts in the prediction and personalization.

  • Who: Guest ID / PII
  • Brand/Property
  • Region
  • Booking Site/Source
  • Transaction Channel/Device
  • Booking Date/Time/Day of Week
  • Travel(Arrival) Date/Time
  • Travel Duration
  • Transaction Amount: Total Paid, Net Price, Discount, Coupon, Fees, Taxes
  • Number of Rooms/Parties
  • Room Class/Price Band
  • Payment Method
  • Corporate Discount Code
  • Special Requests

Promotion Data: On top of these basic lists of behavioral data, you would need promotion history to get into the “what worked” part of analytics, leading to real response models.

  • Promotion Channel
  • Source of Data/List
  • Offer Type
  • Creative Details
  • Segment/Model (for selection/targeting)
  • Drop/Contact Date

Summing It All Up

I am certain that you have much more data, and would need more data categories than ones on this list. For one, promotion data would be much more complicated if you gathered all types of touch data from Google tags and your own mail and email promotion history from multiple vendors. Like I said, this is a cheat sheet, and at some point, you’d have to get deeper.

Plus, you will still have to agonize over how far back in time you would have to go back for a proper data inventory. That really depends on your business, as the data cycle for big ticket items like home furniture or automobiles is far longer than consumables and budget-price items.

When in doubt, start asking your analysts. If they are not sure — i.e., insisting that they must have “everything, all the time”— then call for outside help. Knowing what to keep, based on business objectives, is the first step of building an analytics roadmap, anyway.

No matter how overwhelming this cleanup job may seem, it is something that most organizations must go through — at some point. Otherwise, your own IT department may decide to throw away “old” data, unilaterally. That is more like a foreclosure situation, and you won’t even be able to finish necessary data summary work before some critical data are gone. So, plan for streamlining the data flow like you just sold a house and must move out by a certain date. Happy cleaning, and don’t forget to whistle while you work.

Don’t Be a Data Hoarder — Why Data Governance Matters in Marketing

They say data is an asset. I say it, too. If collected data are wielded properly, they can definitely lead to financial gains, either through a revenue increase or cost reduction. But that doesn’t mean that possessing large amounts of data guarantees large dollar figures for the collector. Data governance matters.

They say data is an asset. I say it, too. If collected data are wielded properly, they can definitely lead to financial gains, either through a revenue increase or cost reduction. But that doesn’t mean that possessing large amounts of data guarantees large dollar figures for the collector. Data governance matters, because the operative words in my statement are “wielded properly,” as I have been emphasizing for years through this column.

Plus, collecting data also comes with risks. When sensitive data go into the wrong hands, it often leads to a direct financial burden for the data collector. In some countries, an assumed guardian of sensitive data may face legal charges for mishandling sensitive data. Even in the United States, which is known as the “freest” country for businesses when it comes to data usage, data breach or clear abuse of data can lead to a publicity nightmare for the organization; or worse, large legal settlements after long and costly litigations. Even in the most innocuous cases, mistreatment of sensitive data may lead to serious damage to the brand image.

The phrase is not even cool in the business community anymore, but “Big Data” worked like a magic word only a few years ago. In my opinion, that word “big” in Big Data misled many organizations and decision-makers. It basically gave a wrong notion that “big” is indeed “good” in the data business.

What is “good,” in a pure business sense? Simply, more money. What was the popular definition of Big Data back then? Three Vs, as in volume, velocity and variety. So, if varieties of data in large volumes move around really fast, it will automatically be good for businesses? We know the answer by now, that a large amount of unstructured, unorganized and unrefined data could just be a burden to the holder, not to mention the security concerns listed earlier.

Unfortunately, with the popularity of Big Data and emergence of cloud computing, many organizations started to hoard data with a hope that collected data would turn into gold one day. Here, I am saying “hoarding” with all of the negative connotations that come with the word.

Hoarders are the people who are not able to throw away anything, even garbage. Data hoarders are the same way. Most datasets are huge because the collector does not know what to throw out. If you ask any hoarder why he keeps so many items in the house, the most common answer would be “because you never know when you need them.” Data hoarders keep every piece of data indefinitely for the same reason.

Only Keep Useful Data

But if you are playing with data for business purposes, you should know what pieces of data are useful for decision-making. The sponsor of any data activity must have clear objectives to begin with. Analysts would then find out what kind of data are necessary to meet those goals, through various statistical analyses and cumulative knowledge.

Actually, good analysts do know that not all data are created equal, and some are more useful than others. Why do you think that the notion of a Data Lake became popular following the Big Data hype? Further, I have been emphasizing the importance of an even more concise data environment. (I call it an “Analytics Sandbox.”) Because the lake water in the Data Lake is still not drinkable. Data must get smaller through data refinement and analytics to be beneficial for decision-makers (refer to “Big Data Must Get Smaller”).

Nonetheless, organizations continue to hoard data, because no one wants to be responsible for purging data that may be useful someday. Government agencies may have some good reasons to maintain large amounts of data, because the cost of losing or misplacing data about some terrorist activities is too high. Even in that case, however, we should collectively be concerned if the most sensitive data about us — such as our biometrics data — reside in some government agency’s server somewhere, without clear and immediate purposes. In cities like London or Paris, cameras are on every street corner, linked to facial recognition algorithms. But we tolerate that because the benefit outweighs the risk (so we think). But that doesn’t mean that we don’t need to be concerned with data breach or abuse.

Hoarding Data Gives Brands the Temptation to Be Creepy

If the data are collected by businesses for their financial gains, then the subjects of such data collection (i.e., consumers) should question who gave them the right to collect data about every breath we take, every move we make and every claim we stake. It is one thing to retain data about mutual transactions, but it is quite another to collect data on our movement or whereabouts, unilaterally. In other words, it is one thing to be remembered (for better service and recommendation in the future), but it is another to be stalked (remember “Every Breath You Take” is a song about a stalker).

Have you heard a story about a stalker who successfully courted the subject as result of stalking? Why do marketers think that they will sell more of their products by stalking their customers and prospects? Since when did being totally creepy – as in “I know where you are and what you’re doing right now” – become an acceptable marketing tactic? (Refer to “Don’t Do It Just Because You Can.”)

In fact, even if you do possess such data, in the interest of “not” being creepy, you must make your message more innocuous. For example, don’t act like you are offering an item because you “know” that the target looked around similar items recently. That kind of creepy approach may work once in a while, but let’s not call that a good sales tactic.

Instead, sellers should make gentle nudges. Don’t say “I know you are looking for this particular skin care item.” The response to that would be “Who the hell are you, and how do you know that?” Instead, do say “Would you be interested in our new product for people with sensitive skin?” The desirable response would be “Hey, I was just looking for something like that!”

The difference between a creepy stalking and a gentle nudging is huge, from the receiving end.

Through many articles about personalization, I have been emphasizing the use of model-based personas, as they pack so much information in the form of answer to questions and cover the gap of missing data (as we’d never know everything about everyone). If I may add one more benefit of modeling, it coverts data into probabilities. Raw data is about “I know she is looking for a particular high-end skin care item,” where coverage of such data is seriously limited, anyway. Conversely, model scores are about “Her score for high-end beauty products is 8 out of 10 scale score,” even if we may not even have concrete data about that specific interest.

Now, users who only have access to the model score — which is “dull” information, in comparison to “sharp” data about some verified behavior — would be less temped to say “Oh, I know you did this.” Even for non-geeky types, the difference between “Is” and “Likely to be” is vast.

If converting sharp data into innocuous probability scores through modeling is too much for you to start with, then at least categorize the data, and expose data points to users that way. Yes, we are living in the world of SKU-level product suggestion (like Amazon does), but as a consumer, have you ever “liked” such blunt suggestions, anyway? Marketers do it because such personalization does better than not doing anything at all, but such a practice is hardly ideal for many reasons (Being creepy being one. Refer to “Personalization Is About the Person”).

The saddest part in all this is that most marketers don’t even know how to fully utilize what they collected. I’ve seen too many organizations that are still stuck with using a few popular data variables repeatedly, while hoarding data indiscriminately. Why risk all of those privacy and security concerns, not to mention the data maintenance cost, if that is the case?

Have a Goal for All of That Data

If analytics is part of the process, then the analysts will tell you with conviction, that you don’t need all those data points for certain types of prediction. For instance, why risk losing a bunch of credit card numbers, when the credit card type or payment method is all you need to predict responses and propensities on a customer level?

Of course, the organization must first decide what types of models and predictions are necessary to meet their goals. But that is the beginning part of the whole analytics game, anyway. Analytics is not about answering to some wishful thinking of data hoarders; it should be a goal-oriented activity, with carefully selected and refined data for clear purposes.

A goal-oriented mindset is even more important in the age of machine learning and automation. Because we should never automate bad behaviors. Imagine a powerful marketing automation engine in the hands of data hoarders. Forget about organizational inefficiency. As a consumer, don’t you get a chill down your spine just imagining how creepy the outcome would be? Well, maybe we don’t really have to imagine it, as we all get bombarded with ineffective and not-so-personal offers every day.

Conclusion

So, marketers, have clear purposes in data activities, and do not become mindless data hoarders. If you do possess data, wield them properly with analytics. And while at it, purge pieces of data that do not fit your goals. That “you never know” attitude really doesn’t help anyone. And you are supposed to know your own goals and what data and methodologies will get you there.