It is often quoted that 2.5 quintillion bytes of data are collected each day. That surely sounds like a big number, considering 1 quintillion bytes (or exabytes, if that sounds fancier) are equal to 1 billion gigabytes. Looking back only about 20 years, I remember my beloved 386-based desktop computer had a hard drive that can barely hold 300 megabytes, which was considered to be quite large in those ancient days. Now, my phone can hold about 65 gigabytes; which, by the way, means nothing to me. I just know that figure equates to about 6,000 songs, plus all my personal information, with room to spare for hundreds of photos and videos. So how do I fathom the size of 2.5 quintillion bytes? I don’t. I give up. I’d rather count the number stars in the universe. And I have been in the database business for more than 25 years.
But I don’t feel bad about that. If a pile of data requires a computer to process it, then it is already too “big” for our brains. In the age of “Big Data,” size matters, but emphasizing the size element is missing the point. People want to understand the data in their own terms and want to use them in decision-making processes. Throwing the raw data around to people without math or computing skills is like galleries handing out paint and brushes to people who want paintings on the wall. Worse yet, continuing to point out how “big” the Big Data world is to them is like quoting the number of rice grains on this planet in front of a hungry man, when he doesn’t even care how many grains are in one bowl. He just wants to eat a bowl of “cooked” rice, and right this moment.
To be a successful data player, one must be the master of the following three steps:
- Refinement; and
Collection and storage are obviously important in the age of Big Data. However, that in itself shouldn’t be the goal. I hear lots of bragging about how much data can be collected and stored, and how fast the data can be retrieved.
Great, you can retrieve any transaction detail going back 20 years in less than 0.5 seconds. Congratulations. But can you now tell me whom are more likely to be loyal customers for the next five years, with annual spending potential of more than $250? Or who is more likely to quit using the service in next 60 days? Who is more likely to be on a cruise ship leaving the dock on the East Coast heading for Europe between Thanksgiving and Christmas, with onboard spending potential greater than $300? Who is more likely to respond to emails with free shipping offers? Where should I open my next store selling fancy children’s products? What do my customers look like, and where do they go between 6 and 9 p.m.?
Answers to these types of questions do not come from the raw data, but they should be derived from the data through the data refinement process. And that is the hard part. Asking the right questions, expressing the goals in a mathematical format, throwing out data that don’t fit the question, merging data from a diverse array of sources, summarizing the data into meaningful levels, filling in the blanks (there will be plenty—even these days), and running statistical models to come up with scores that look like an answer to the question are all parts of the data refinement process. It is a lot like manufacturing gold watches, where mining gold is just an important first step. But a piece of gold won’t tell you what time it is.
The final step is to deliver that answer—which, by now, should be in a user-friendly format—to the user at the right time in the right format. Often, lots of data-related products only emphasize this part, as it is the most intimate one to the users. After all, it provides an illusion that the user is in total control, being able to touch the data so nicely displayed on the screen. Such tool sets may produce impressive-looking reports and dazzling graphics. But, lest we forget, they are only representations of the data refinement processes. In addition, no tool set will ever do the thinking part for anyone. I’ve seen so many missed opportunities where decision-makers invested obscene amounts of money in fancy tool sets, believing they will conduct all the logical and data refinement work for them, automatically. That is like believing that purchasing the top of the line Fender Stratocaster will guarantee that you will play like Eric Clapton in the near future. Yes, the tool sets are important as delivery mechanisms of refined data, but none of them replace the refinement part. Doing so would be like skipping guitar practice after spending $3,000 on a guitar.
Big Data business should be about providing answers to questions. It should be about humans who are the subjects of data collection and, in turn, the ultimate beneficiaries of information. It’s not about IT or tool sets that come and go like hit songs. But it should be about inserting advanced use of data into everyday decision-making processes by all kinds of people, not just the ones with statistics degrees. The goal of data players must be to make it simple—not bigger and more complex.
I boldly predict that missing these points will make “Big Data” a dirty word in the next three years. Emphasizing the size element alone will lead to unbalanced investments, which will then lead to disappointing results with not much to show for them in this cruel age of ROI. That is a sure way to kill the buzz. Not that I am that fond of the expression “Big Data”; though, I admit, one benefit has been that I don’t have to explain what I do for living for 10 minutes any more. Nonetheless, all the Big Data folks may need an exit plan if we are indeed heading for the days when it will be yet another disappointing buzzword. So let’s do this one right, and start thinking about refining the data first and foremost.
Collection and storage are just so last year.