I often hear statements like “Our client has a Tableau problem.” Or, it is something about Hadoop or data platforms, as in “We have an issue with Hadoop.” What did it do, use offensive language? I wonder what the real issue is.
In any case, such general statements don’t help much. I guess a medical doctor feels the same way when she hears that her patient has a headache. What does that even mean, headache? What kind of headache? Prolonging or sporadic? Throbbing or sharp pain? Overall, or one-sided? Or, do you just want to avoid conversations with your spouse?
Symptoms are not always related to root causes. Why would marketers think they have a problem with Tableau? Isn’t that a reporting and display tool? Unless one doesn’t like the way a bubble chart comes out, nothing really is a Tableau problem.
More often than not, reporting issues stem back to the data. What could be the major issues with the report? Inaccuracy, inconsistency or just plain suckiness? If the data on the report don’t make any sense, we must dig deeper. And let’s not forget that reporting tools are not even designed to handle heavy-duty data manipulations. But if the report doesn’t make any sense or is hard to understand — well, then — let’s blame the designer of such a report, not the toolset.
For the record, I do not represent analytical toolset companies like SAS, SPSS or Tableau. Maybe they should share some blame, because they must have sold the toolsets as an almighty data mining tool that just does it all. But I am addressing the issue this way; as, at least for now, forming proper questions, defining problem statements, data modeling (for analytics), report design and, most importantly, deriving insights out of the report solidly remain as human functions.
Let’s break it down further. When faced with a large amount of unrefined, unstructured and uncategorized data, we must indeed fix the data first. Let’s not even think about blaming the data storage platforms like Hadoop, MongoDB or Teradata here. That would be like blaming rice storage facilities for not being able to refine rice for human consumption. In other words, we should not put too much of a burden on the data collection and storage systems when it comes to data refinement.
Data refinement should be dealt with as a separate entry altogether; between data collection (such as Hadoop) and data delivery (such as Tableau), each requiring different skillsets and expertise. Such data refinement work includes:
- Data Hygiene and Edit: As no data source is immaculate. In fact, many analysts waste their valuable time on fixing dirty data (and following the steps listed below).
- Data Categorization and Tagging: As uncategorized freeform data must be put into buckets and properly tagged for advanced analytics (refer to “Free Form Data Are Not Exactly Free”).
- Data Consolidation: As disparate data sources must be “merged” (to create a “360-degree view of the customer” around a person, for example), or “concatenated” (to increase coverage by adding similar types of data).
- Data Summarization and Variable Creation: To transform data to describe different levels (transaction, emails, customers, companies, etc.), as in converting transaction or event-level data into “descriptors of individual customers” (refer to “Beyond RFM Data”).
- Treat Missing Values: As no data will ever be fully complete, we need to fill in the gaps either with statistical models or business rules (refer to “Missing Data Can Be Meaningful”).
If the salesperson who sold you the reporting toolset promised that the product would do all of these things, well, just ignore him. Even in the age of AI, these steps must be performed by separate machines (or teams) trained for specific tasks. Simply, machines are not that smart yet; AI trained for “recognition” won’t be able to “predict” and fill in the blanks for you. That also means that these are not to be done by human analysts all by themselves.
Nonetheless, the steps listed here must be completed before the reporting or any other analytical work even begins. We can even say that the reporting step is the simplest one of all. But only if the reports are designed properly first. And that is the catch.
No amount of pretty charts can be meaningful if there is no story behind it. That would be like watching a movie filled with so-called state-of-the-art special effects with no character development or viable storyline. That may work as a trailer, but that’s about it. Now, if you are an analyst having to present findings to a client or your boss, you don’t want to be the one who loses steam five minutes after the meeting begins. A 40-page PowerPoint deck? So what? What does all of that mean? What are we supposed to do about it?