4 Tips to Turn Bad Data Into Good Results

When it comes to making better, data-driven marketing decisions, the No. 1 excuse I hear from professionals about why they’re not doing so is that they have bad data. Often, they are right. However, this is rarely a black-and-white scenario. Most times, marketers still have ample data to make better (not perfect) decisions.

bad data
“Misinformation,” Creative Commons license. | Credit: Flickr by jimjarmo

When it comes to making better, data-driven marketing decisions, the No. 1 excuse I hear from professionals about why they’re not doing so is that they have bad data. Often, they are right. However, this is rarely a black-and-white scenario. Most times, marketers still have ample data to make better (not perfect) decisions.

Many of my consulting engagements have resulted in sound strategic advice based on error-prone data sets. Below are four tips on how to work with bad data to yield valuable information.

  • Identify Corroborating Data: When encountering “bad data,” there are often other sources of data that can be used to corroborate what you are trying to measure. For example, I was working with a retailer who claimed to have unreliable inventory data. Naturally, for a retailer, this is a huge problem. However, we were able to leverage point-of-sale information to identify SKUs that were usually fast-moving, but suddenly exhibited zero sales. While the inventory system indicated low (not depleted) stock, the sales pattern could be used to confirm that there was an inventory issue directly affecting revenue. By leveraging this knowledge, we were able to reset replenishment thresholds and triggers, which kept high-demand merchandise in stock.
  • Investigate the Bad Rep: A data set sometimes gets a notorious reputation because of what I call “noisy outliers.” These are errors that get significant attention, but only represent a small proportion of the data which is mostly correct. Once, we were working with household policy data for a personal lines insurer. There were several cases where policies were identified as belonging to separate households when they weren’t, and visa versa. A quick investigation found a handful of issues (such as incorrect addresses, multiple addresses for the same household and policies sold by different agents) which drove most of the house-holding errors. Once identified, correcting code was written and a much cleaner data set was created.
  • Differentiate Between Zero and Null: Missing data can also prevent decision-makers from taking advantage of a valuable data set. The first step in such instances, is to determine if the values are really missing or if they are in fact zero. This often takes some investigative work to understand the logic behind how value is generated and if the system uses a zero or a blank to identify no activity. (Remember, no activity is not the same as missing information). Assuming that a value is indeed missing, then two immediate options are present. First are there proxy values that can be used to generate the missing values? Sometimes, the proxy data is a combination of several variables and requires some experimentation. Second, can the business question still be answered by ignoring the missing data and working with the data you do have? In my experience, most times missing data is more of a hurdle and not a brick wall when seeking a data-driven answer.
  • Use Random Error to Your Advantage: Finally, there will be times when either it is too time-consuming to fix bad data or it is just unfixable. However, If you are trying to measure differences among groups or time periods, then the data may still be helpful. If you can safely assume the errors are random, then it is possible that the errors will cancel each other out and actual differences between groups can still be measured. For example, my team was working with Web traffic data from two recently merged brands. As a result, there were two separate Web analytics platforms. Each system provided slightly different measurements and had visitor identification issues. However, there was no reason to believe one brand’s site had a bigger problem vs. the other, or that they were of a different nature. On the positive side, many of the segmentation factors were very similar. As a result, segment-level differences could be observed using data from both websites and a combined segment-driven strategy could be employed, saving the combined company millions.

The tips above are not exhaustive and every situation is unique; however, my experience is that most companies give up on bad data sets too quickly, especially when making important business decisions. The tips outlined above are a good starting point if you want to mine gold out of a bad data set. That said, I am also a believer in not being hostage to existing data. In many cases now, more relevant data can be generated in a few weeks, especially in digital marketing. Just something to think about.