Marketing and IT; Cats and Dogs

Cats and dogs do not get along unless they grew up together since birth. That is because cats and dogs have rather fundamental communication problems with each other. A dog would wag his tail in an upward position when he wants to play. To a cat though, upward-tail is a sure sign of hostility, as in “What’s up, dawg?!” In fact, if you observe an angry or nervous cat, you will see that everything is up; tail, hair, toes, even her spine. So imagine the dog’s confusion in this situation, where he just sent a friendly signal that he wants to play with the cat, and what he gets back are loud hisses and scary evil eyes—but along with an upward tail that “looks” like a peace sign to him. Yeah, I admit that I am a bona-fide dog person, so I looked at this from his perspective, first. But I sympathize with the cat, too. As from her point of view, the dog started to mess with her, disrupting an afternoon slumber in her favorite sunny spot by wagging his stupid tail. Encounters like this cannot end well. Thank goodness that us Homo sapiens lost our tails during our evolutionary journey, as that would have been one more thing that clueless guys would have to decode regarding the mood of our female companions. Imagine a conversation like “How could you not see that I didn’t mean it? My tail was pointing the ground when I said that!” Then a guy would say, “Oh jeez, because I was looking at your lips moving up and down when you were saying something?”

 

Cats and dogs do not get along unless they grew up together since birth. That is because cats and dogs have rather fundamental communication problems with each other. A dog would wag his tail in an upward position when he wants to play. To a cat though, upward-tail is a sure sign of hostility, as in “What’s up, dawg?!” In fact, if you observe an angry or nervous cat, you will see that everything is up; tail, hair, toes, even her spine. So imagine the dog’s confusion in this situation, where he just sent a friendly signal that he wants to play with the cat, and what he gets back are loud hisses and scary evil eyes—but along with an upward tail that “looks” like a peace sign to him. Yeah, I admit that I am a bona-fide dog person, so I looked at this from his perspective first. But I sympathize with the cat, too. As from her point of view, the dog started to mess with her, disrupting an afternoon slumber in her favorite sunny spot by wagging his stupid tail. Encounters like this cannot end well. Thank goodness that us Homo sapiens lost our tails during our evolutionary journey, as that would have been one more thing that clueless guys would have to decode regarding the mood of our female companions. Imagine a conversation like “How could you not see that I didn’t mean it? My tail was pointing the ground when I said that!” Then a guy would say, “Oh jeez, because I was looking at your lips moving up and down when you were saying something?”

Of course I am generalizing for a comedic effect here, but I see communication breakdowns like this all the time in business environments, especially between the marketing and IT teams. You think men are from Mars and women are from Venus? I think IT folks are from Vulcan and marketing people are from Betazed (if you didn’t get this, find a Trekkie around you and ask).

Now that we are living in the age of Big Data where marketing messages must be custom-tailored based on data, we really need to find a way to narrow the gap between the marketing and the IT world. I wouldn’t dare to say which side is more like a dog or a cat, as I will surely offend someone. But I think even non-Trekkies would agree that it could be terribly frustrating to talk to a Vulcan who thinks that every sentence must be logically impeccable, or a Betazed who thinks that someone’s emotional state is the way it is just because she read it that way. How do they meet in the middle? They need a translator—generally a “human” captain of a starship—between the two worlds, and that translator had better speak both languages fluently and understand both cultures without any preconceived notions.

Similarly, we need translators between the IT world and the marketing world, too. Call such translators “data scientists” if you want (refer to “How to Be a Good Data Scientist”). Or, at times a data strategist or a consultant like myself plays that role. Call us “bats” caught in between the beasts and the birds in an Aesop’s tale, as we need to be marginal people who don’t really belong to one specific world 100 percent. At times, it is a lonely place as we are understood by none, and often we are blamed for representing “the other side.” It is hard enough to be an expert in data and analytics, and we now have to master the artistry of diplomacy. But that is the reality, and I have seen plenty of evidence as to why people whose main job it is to harness meanings out of data must act as translators, as well.

IT is a very special function in modern organizations, regardless of their business models. Systems must run smoothly without errors, and all employees and outside collaborators must constantly be in connection through all imaginable devices and operating systems. Data must be securely stored and backed up regularly, and permissions to access them must be granted based on complex rules, based on job levels and functions. Then there are constant requests to install and maintain new and strange software and technologies, which should be patched and updated diligently. And God forbid if anything fails to work even for a few seconds on a weekend, all hell will break lose. Simply, the end-users—many of them in positions of dealing with customers and clients directly—do not care about IT when things run smoothly, as they take them all for granted. But when they don’t, you know the consequences. Thankless job? You bet. It is like a utility company never getting praises when the lights are up, but everyone yelling and screaming if the service is disrupted, even for a natural cause.

On the other side of the world, there are marketers, salespeople and account executives who deal with customers, clients and their bosses, who would treat IT like their servants, not partners, when things do not “seem” to work properly or when “their” sales projections are not met. The craziest part is that most customers, clients and bosses state their goals and complaints in the most ambiguous terms, as in “This ad doesn’t look slick enough,” “This copy doesn’t talk to me,” “This app doesn’t stick” or “We need to find the right audience.” What the IT folks often do not grasp is that (1) it really stinks when you get yelled at by customers and clients for any reason, and (2) not all business goals are easily translatable to logical statements. And this is when all data elements and systems are functioning within normal parameters.

Without a proper translator, marketers often self-prescribe solutions that call for data work and analytics. Often, they think that all the problems will go away if they have unlimited access to every piece of data ever collected. So they ask for exactly that. IT will respond that such request will put a terrible burden on the system, which has to support not just marketing but also other operations. Eventually they may meet in the middle and the marketer will have access to more data than ever possible in the past. Then the marketers realize that their business issues do not go away just because they have more data in their hands. In fact, their job seems to have gotten even more complicated. They think that it is because data elements are too difficult to understand and they start blaming the data dictionary or lack thereof. They start using words like Data Governance and Quality Control, which may sound almost offensive to diligent IT personnel. IT will respond that they showed every useful bit of data they are allowed to share without breaking the security protocol, and the data dictionaries are all up to date. Marketers say the data dictionaries are hard to understand, and they are filled with too many similar variables and seemingly conflicting information. IT now says they need yet another tool set to properly implement data governance protocols and deploy them. Heck, I have seen cases where some heads of IT went for complete re-platforming of their system, as if that would answer all the marketing questions. Now, does this sound familiar so far? Does it sound like your own experience, like when you are reading “Dilbert” comic strips? It is because you are not alone in all this.

Allow me to be a little more specific with an example. Marketers often talk about “High-Value Customers.” To people who deal with 1s and 0s, that means less than nothing. What does that even mean? Because “high-value customers” could be:

  • High-dollar spenders—But what if they do not purchase often?
  • Frequent shoppers—But what if they don’t spend much at all?
  • Recent customers—Oh, those coveted “hotline” names … but will they stay that way, even for another few months?
  • Tenured customers—But are they loyal to your business, now?
  • Customers with high loyalty points—Or are they just racking up points and they would do anything to accumulate points?
  • High activity—Such as point redemptions and other non-monetary activities, but what if all those activities do not generate profit?
  • Profitable customers—The nice ones who don’t need much hand-holding. And where do we get the “cost” side of the equation on a personal level?
  • Customers who purchases extra items—Such as cruisers who drink a lot on board or diners who order many special items, as suggested.
  • Etc., etc …

Now it gets more complex, as these definitions must be represented in numbers and figures, and depending on the industry, whether be they for retailers, airlines, hotels, cruise ships, credit cards, investments, utilities, non-profit or business services, variables that would be employed to define seemingly straightforward “high-value customers” would be vastly different. But let’s say that we pick an airline as an example. Let me ask you this; how frequent is frequent enough for anyone to be called a frequent flyer?

Let’s just assume that we are going through an exercise of defining a frequent flyer for an airline company, not for any other travel-related businesses or even travel agencies (that would deal with lots of non-flyers). Granted that we have access to all necessary data, we may consider using:

1. Number of Miles—But for how many years? If we go back too far, shouldn’t we have to examine further if the customer is still active with the airline in question? And what does “active” mean to you?

2. Dollars Spent—Again for how long? In what currency? Converted into U.S. dollars at what point in time?

3. Number of Full-Price Ticket Purchases—OK, do we get to see all the ticket codes that define full price? What about customers who purchased tickets through partners and agencies vs. direct buyers through the airline’s website? Do they share a common coding system?

4. Days Between Travel—What date shall we use? Booking date, payment date or travel date? What time zones should we use for consistency? If UTC/GMT is to be used, how will we know who is booking trips during business hours vs. evening hours in their own time zone?

After a considerable hours of debate, let’s say that we reached the conclusion that all involved parties could live with. Then we find out that the databases from the IT department are all on “event” levels (such as clicks, views, bookings, payments, boarding, redemption, etc.), and we would have to realign and summarize the data—in terms of miles, dollars and trips—on an individual customer level to create a definition of “frequent flyers.”

In other words, we would need to see the data from the customer-centric point of view, just to begin the discussion about frequent flyers, not to mention how to communicate with each customer in the future. Now, it that a job for IT or marketing? Who will put the bell on the cat’s neck? (Hint: Not the dog.) Well, it depends. But this definitely is not a traditional IT function, nor is it a standalone analytical project. It is something in between, requiring a translator.

Customer-Centric Database, Revisited
I have been emphasizing the importance of a customer-centric view throughout this series, and I also shared some details regarding databases designed for marketing functions (refer to: “Cheat Sheet: Is Your Database Marketing Ready?”). But allow me to reiterate this point.

In the age of abundant and ubiquitous data, omnichannel marketing communication—optimized based on customers’ past transaction history, product preferences, and demographic and behavioral personas—should be an effortless routine. The reality is far from it for many organizations, as it is very common that much of the vital information is locked in silos without being properly consolidated or governed by a standard set of business rules. It is not that creating such a marketing-oriented database (or data-mart) is solely the IT department’s responsibility, but having a dedicated information source for efficient personalization should be an organizational priority in modern days.

Most databases nowadays are optimized for data collection, storage and rapid retrieval, and such design in general does not provide a customer-centric view—which is essential for any type of personalized communication via all conceivable channels and devices of the present and future. Using brand-, division-, product-, channel- or device-centric datasets is often the biggest obstacle in the journey to an optimal customer experience, as those describe events and transactions, not individuals. Further, bits and pieces of information must be transformed into answers to questions through advanced analytics, including statistical models.

In short, all analytical efforts must be geared toward meeting business objectives, and databases must be optimized for analytics (refer to “Chicken or the Egg? Data or Analytics?”). Unfortunately, the situation is completely reversed in many organizations, where analytical maneuvering is limited due to inadequate source data, and decision-making processes are dictated by limitations of available analytics. Visible symptoms of such cases are, to list a few, elongated project cycle time, decreasing response rates, ineffective customer communication, saturation of data sources due to overexposure, and—as I was emphasizing in this article—communication breakdown among divisions and team members. I can even go as far as to say that the lack of a properly designed analytical environment is the No. 1 cause of miscommunications between IT and marketing.

Without a doubt, key pieces of data must reside in the centralized data depository—generally governed by IT—for effective marketing. But that is only the beginning and still is just a part of the data collection process. Collected data must be consolidated around the solid definition of a “customer,” and all product-, transaction-, event- and channel-level information should be transformed into descriptors of customers, via data standardization, categorization, transformation and summarization. Then the data may be further enhanced via third-party data acquisition and statistical modeling, using all available data. In other words, raw data must be refined through these steps to be useful in marketing and other customer interactions, online or offline (refer to “‘Big Data’ Is Like Mining Gold for a Watch—Gold Can’t Tell Time“). It does not matter how well the original transaction- or event-level data are stored in the main database without visible errors, or what kind of state-of-the-art communication tool sets a company is equipped with. Trying to use raw data for a near real-time personalization engine is like putting unrefined oil into a high-performance sports car.

This whole data refinement process may sound like a daunting task, but it is not nearly as painful as analytical efforts to derive meanings out of unstructured, unconsolidated and uncategorized data that are scattered all over the organization. A customer-centric marketing database (call it a data-mart if “database” sounds too much like it should solely belong to IT) created with standard business rules and uniform variables sets would, in turn, provide an “analytics-ready” environment, where statistical modeling and other advanced analytics efforts would gain tremendous momentum. In the end, the decision-making process would become much more efficient as analytics would provide answers to questions, not just bits and pieces of fragmented data, to the ultimate beneficiaries of data. And answers to questions do not require an enormous data dictionary, either; fast-acting marketing machines do not have time to look up dictionaries, anyway.

Data Roadmap—Phased Approach
For the effort to build a consolidated marketing data platform that is analytics-ready (hence, marketing-ready), I always recommend a phased approach, as (1) inevitable complexity of a data consolidation project will be contained and managed more efficiently in carefully defined phases, and (2) each phase will require different types of expertise, tool sets and technologies. Nevertheless, the overall project must be managed by an internal champion, along with a group of experts who possess long-term vision and tactical knowledge in both database and analytics technologies. That means this effort must reside above IT and marketing, and it should be seen as a strategic effort for an entire organization. If the company already hired a Chief Data Officer, I would say that this should be one of the top priorities for that position. If not, outsourcing would be a good option, as an impartial decision-maker, who would play a role of a referee, may have to come from the outside.

The following are the major steps:

  1. Formulate Questions: “All of the above” is not a good way to start a complex project. In order to come up with the most effective way to build a centralized data depository, we first need to understand what questions must be answered by it. Too many database projects call for cars that must fly, as well.
  2. Data Inventory: Every organization has more data than it expected, and not all goldmines are in plain sight. All the gatekeepers of existing databases should be interviewed, and any data that could be valuable for customer descriptions or behavioral predictions should be considered, starting with product, transaction, promotion and response data, stemming from all divisions and marketing channels.
  3. Data Hygiene and Standardization: All available data fields should be examined and cleaned up, where some data may be discarded or modified. Free form fields would deserve special attention, as categorization and tagging are one of the key steps to opening up new intelligence.
  4. Customer Definition: Any existing Customer ID systems (such as loyalty program ID, account number, etc.) will be examined. It may be further enhanced with available PII (personally identifiable information), as there could be inconsistencies among different systems, and customers often move their residency or use multiple email addresses, creating duplicate identities. A consistent and reliable Customer ID system becomes the backbone of a customer-centric database.
  5. Data Consolidation: Data from different silos and divisions will be merged together based on the master Customer ID. A customer-centric database begins to take shape here. The database update process should be thoroughly tested, as “incremental” updates are often found to be more difficult than the initial build. The job is simply not done until after a few successful iterations of updates.
  6. Data Transformation: Once a solid Customer ID system is in place, all transaction- and event-level data will be transformed to “descriptors” of individual customers, via summarization by categories and creation of analytical variables. For example, all product information will be aligned for each customer, and transaction data will be converted into personal-level monetary summaries and activities, in both static and time-series formats. Promotion and response history data will go through similar processes, yielding individual-level ROI metrics by channel and time period. This is the single-most critical step in all of this, requiring deep knowledge in business, data and analytics, as the stage is being set for all future analytics and reporting efforts. Due to variety and uniqueness of business goals in different industries, a one-size-fits-all approach will not work, either.
  7. Analytical Projects: Test projects will be selected and the entire process will be done on the new platform. Ad-hoc reporting and complex modeling projects will be conducted, and the results will be graded on timing, accuracy, consistency and user-friendliness. An iterative approach is required, as it is impossible to foresee all possible user requests and related complexities upfront. A database should be treated as a living, breathing organism, not something rigid and inflexible. Marketers will “break-in” the database as they use it more routinely.
  8. Applying the Knowledge: The outcomes of analytical projects will be applied to the entire customer base, and live campaigns will be run based on them. Often, major breakdowns happen at the large-scale deployment stages; especially when dealing with millions of customers and complex mathematical formulae at the same time. A model-ready database will definitely minimize the risk (hence, the term “in-database scoring”), but the process will still require some fine-tuning. To proliferate gained knowledge throughout the organization, some model scores—which pack deep intelligence in small sizes—may be transferred back to the main databases managed by IT. Imagine model scores driving operational decisions—live, on the ground.
  9. Result Analysis: Good marketing intelligence engines must be equipped with feedback mechanisms, effectively closing the “loop” where each iteration of marketing efforts improves its effectiveness with accumulated knowledge on a customer level. It is very unfortunate that many marketers just move through the tracks set up by their predecessors, mainly because existing database environments are not even equipped to link necessary data elements on a customer level. Too many back-end analyses are just event-, offer- or channel-driven, not customer-centric. Can you easily tell which customer is over-, under- or adequately promoted, based on a personal-level promotion-and-response ratio? With a customer-centric view established, you can.

These are just high-level summaries of key steps, and each step should be managed as independent projects within a large-scale initiative with common goals. Some steps may run concurrently to reduce the overall timeline, and tactical knowledge in all required technologies and tool sets is the key for the successful implementation of centralized marketing intelligence.

Who Will Do the Work?
Then, who will be in charge of all this and who will actually do the work? As I mentioned earlier, a job of this magnitude requires a champion, and a CDO may be a good fit. But each of these steps will require different skill sets, so some outsourcing may be inevitable (more on how to pick an outsourcing partner in future articles).

But the case that should not be is the IT team or the analytics team solely dictating the whole process. Creating a central depository of marketing intelligence is something that sits between IT and marketing, and the decisions must be made with business goals in mind, not just limitations and challenges that IT faces. If the CDO or the champion of this type of initiative starts representing IT issues before overall business goals, then the project is doomed from the beginning. Again, it is not about touching the core database of the company, but realigning existing data assets to create new intelligence. Raw data (no matter how clean they are at the collection stage) are like unrefined raw materials to the users. What the decision-makers need are simple answers to their questions, not hundreds of data pieces.

From the user’s point of view, data should be:

  • Easy to understand and use (intuitive to non-mathematicians)
  • Bite-size (i.e., small answers, not mounds of raw data)
  • Useful and effective (consistently accurate)
  • Broad (answers should be available most of time, not just “sometimes”)
  • Readily available (data should be easily accessible via users’ favorite devices/channels)

And getting to this point is the job of a translator who sits in between marketing and IT. Call them data scientists or data strategists, if you like. But they do not belong to just marketing or IT, even though they have to understand both sides really well. Do not be rigid, insisting that all pilots must belong to the Air Force; some pilots do belong to the Navy.

Lastly, let me add this at the risk of sounding like I am siding with technologists. Marketers, please don’t be bad patients. Don’t be that bad patient who shows up at a doctor’s office with a specific prescription, as in “Don’t ask me why, but just give me these pills, now.” I’ve even met an executive who wanted a neural-net model for his business without telling me why. I just said to myself, “Hmm, he must have been to one of those analytics conferences recently.” Then after listening to his “business” issues, I prescribed an entirely different solution package.

So, instead of blurting out requests for pieces of data variables or queries using cool-sounding, semi-technical terms, state the business issues and challenges that you are facing as clearly as possible. IT and analytics specialists will prescribe the right solution for you if they understand the ultimate goals better. Too often, requesters determine the solutions they want without any understanding of underlying technical issues. Don’t forget that the end-users of any technology are only exposed to symptoms, not the causes.

And if Mr. Spock doesn’t seem to understand your issues and keeps saying that your statements are illogical, then call in a translator, even if you have to hire him for just one day. I know this all too well, because after all, this one phrase summarizes my entire career: “A bridge person between the marketing world and the IT world.” Although it ain’t easy to live a life as a marginal person.

Smart Data – Not Big Data

As a concerned data professional, I am already plotting an exit strategy from this Big Data hype. Because like any bubble, it will surely burst. That inevitable doomsday could be a couple of years away, but I can feel it coming. At the risk of sounding too much like Yoda the Jedi Grand Master, all hypes lead to over-investments, all over-investments lead to disappointments, and all disappointments lead to blames. Yes, in a few years, lots of blames will go around, and lots of heads will roll.

As a concerned data professional, I am already plotting an exit strategy from this Big Data hype. Because like any bubble, it will surely burst. That inevitable doomsday could be a couple of years away, but I can feel it coming. At the risk of sounding too much like Yoda the Jedi Grand Master, all hypes lead to over-investments, all over-investments lead to disappointments, and all disappointments lead to blames. Yes, in a few years, lots of blames will go around, and lots of heads will roll.

So, why would I stay on the troubled side? Well, because, for now, this Big Data thing is creating lots of opportunities, too. I am writing this on my way back from Seoul, Korea, where I presented this Big Data idea nine times in just two short weeks, trotting from large venues to small gatherings. Just a few years back, I used to have a hard time explaining what I do for living. Now, I just have to say “Hey, I do this Big Data thing,” and the doors start to open. In my experience, this is the best “Open Sesame” moment for all data specialists. But it will last only if we play it right.

Nonetheless, I also know that I will somehow continue to make living setting data strategies, fixing bad data, designing databases and leading analytical activities, even after the hype cools down. Just with a different title, under a different banner. I’ve seen buzzwords come and go, and this data business has been carried on by the people who cut through each hype (and gargantuan amount of BS along with it) and create real revenue-generating opportunities. At the end of the day (I apologize for using this cliché), it is all about the bottom line, whether it comes from a revenue increase or cost reduction. It is never about the buzzwords that may have created the business opportunities in the first place; it has always been more about the substance that turned those opportunities into money-making machines. And substance needs no fancy title or buzzwords attached to it.

Have you heard Google or Amazon calling themselves a “Big Data” companies? They are the ones with sick amounts of data, but they also know that it is not about the sheer amount of data, but it is all about the user experience. “Wannabes” who are not able to understand the core values often hang onto buzzwords and hypes. As if Big Data, Cloud Computing or coding language du jour will come and save the day. But they are just words.

Even the name “Big Data” is all wrong, as it implies that bigger is always better. The 3 Vs of Big Data—volume, velocity and variety—are also misleading. That could be a meaningful distinction for existing data players, but for decision-makers, it gives a notion that size and speed are the ultimate quest. But for the users, small is better. They don’t have time to analyze big sets of data. They need small answers in fun size packages. Plus, why is big and fast new? Since the invention of modern computers, has there been any year when the processing speed did not get faster and storage capacity did not get bigger?

Lest we forget, it is the software industry that came up with this Big Data thing. It was created as a marketing tagline. We should have read it as, “Yes, we can now process really large amounts of data, too,” not as, “Big Data will make all your dreams come true.” If you are in the business of selling toolsets, of course, that is how you present your product. If guitar companies keep emphasizing how hard it is to be a decent guitar player, would that help their businesses? It is a lot more effective to say, “Hey, this is the same guitar that your guitar hero plays!” But you don’t become Jeff Beck just because you bought a white Fender Stratocaster with a rosewood neck. The real hard work begins “after” you purchase a decent guitar. However, this obvious connection is often lost in the data business. Toolsets never provide solutions on their own. They may make your life easier, but you’d still have to formulate the question in a logical fashion, and still have to make decisions based on provided data. And harnessing meanings out of mounds of data requires training of your mind, much like the way musicians practice incessantly.

So, before business people even consider venturing into this Big Data hype, they should ask themselves “Why data?” What are burning questions that you are trying to solve with the data? If you can’t answer this simple question, then don’t jump into it. Forget about it. Don’t get into it just because everyone else seems to be getting into it. Yeah, it’s a big party, but why are you going there? Besides, if you formulate the question properly, often you will find that you don’t need Big Data all the time. If fact, Big Data can be a terrible detour if your question can be answered by “small” data. But that happens all the time, because people approach their business questions through the processes set by the toolsets. Big Data should be about the business, not about the IT or data.

Smart Data, Not Big Data
So, how do we get over this hype? All too often, perception rules, and a replacement word becomes necessary to summarize the essence of the concept for the general public. In my opinion, “Big Data” should have been “Smart Data.” Piles of unorganized dumb data aren’t worth a damn thing. Imagine a warehouse full of boxes with no labels, collecting dust since 1943. Would you be impressed with the sheer size of the warehouse? Great, the ark that Indiana Jones procured (or did he?) may be stored in there somewhere. But if no one knows where it is—or even if it can be located, if no one knows what to do with it—who cares?

Then, how do data get smarter? Smart data are bite-sized answers to questions. A thousand variables could have been considered to provide the weather forecast that calls for a “70 percent chance of scattered showers in the afternoon,” but that one line that we hear is the smart piece of data. Not the list of all the variables that went into the formula that created that answer. Emphasizing the raw data would be like giving paints and brushes to a person who wants a picture on the wall. As in, “Hey, here are all the ingredients, so why don’t you paint the picture and hang it on the wall?” Unfortunately, that is how the Big Data movement looks now. And too often, even the ingredients aren’t all that great.

I visit many companies only to find that the databases in question are just messy piles of unorganized and unstructured data. And please do not assume that such disarrays are good for my business. I’d rather spend my time harnessing meanings out of data and creating values, not taking care of someone else’s mess all the time. Really smart data are small, concise, clean and organized. Big Data should only be seen in “Behind the Scenes” types of documentaries for manias, not for everyday decision-makers.

I have been already saying that Big Data must get smaller for some time (refer to “Big Data Must Get Smaller“) and I would repeat it until it becomes a movement on its own. The Big Data movement must be about:

  1. Cutting down the noise
  2. Providing the answers

There is too much noise in the data, and cutting it out is the first step toward making the data smaller and smarter. The trouble is that the definition of “noise” is not static. Rock music that I grew up with was certainly a noise to my parents’ generation. In turn, some music that my kids listen to is pure noise to me. Likewise, “product color,” which is essential for a database designed for an inventory management system, may or may not be noise if the goal is to sell more apparel items. In such cases, more important variables could be style, brand, price range, target gender, etc., but color could be just peripheral information at best, or even noise (as in, “Uh, she isn’t going to buy just red shoes all the time?”). How do we then determine the differences? First, set the clear goals (as in, “Why are we playing with the data to begin with?”), define the goals using logical expressions, and let mathematics take care of it. Now you can drop the noise with conviction (even if it may look important to human minds).

If we continue with that mathematical path, we would reach the second part, which is “providing answers to the question.” And the smart answers are in the forms of yes/no, probability figures or some type of scores. Like in the weather forecast example, the question would be “chance of rain on a certain day” and the answer would be “70 percent.” Statistical modeling is not easy or simple, but it is the essential part of making the data smarter, as models are the most effective way to summarize complex and abundant data into compact forms (refer to “Why Model?”).

Most people do not have degrees in mathematics or statistics, but they all know what to do with a piece of information such as “70 percent chance of rain” on the day of a company outing. Some may complain that it is not a definite yes/no answer, but all would agree that providing information in this form is more humane than dumping all the raw data onto users. Sales folks are not necessarily mathematicians, but they would certainly appreciate scores attached to each lead, as in “more or less likely to close.” No, that is not a definite answer, but now sales people can start calling the leads in the order of relative importance to them.

So, all the Big Data players and data scientists must try to “humanize” the data, instead of bragging about the size of the data, making things more complex, and providing irrelevant pieces of raw data to users. Make things simpler, not more complex. Some may think that complexity is their job security, but I strongly disagree. That is a sure way to bring down this Big Data movement to the ground. We are already living in a complex world, and we certainly do not need more complications around us (more on “How to be a good data scientist” in a future article).

It’s About the Users, Too
On the flip side, the decision-makers must change their attitude about the data, as well.

1. Define the goals first: The main theme of this series has been that the Big Data movement is about the business, not IT or data. But I’ve seen too many business folks who would so willingly take a hands-off approach to data. They just fund the database; do not define clear business goals to developers; and hope to God that someday, somehow, some genius will show up and clear up the mess for them. Guess what? That cavalry is never coming if you are not even praying properly. If you do not know what problems you want to solve with data, don’t even get started; you will get to nowhere really slowly, bleeding lots of money and time along the way.

2. Take the data seriously: You don’t have to be a scientist to have a scientific mind. It is not ideal if someone blindly subscribes anything computers spew out (there are lots of inaccurate information in databases; refer to “Not All Databases Are Created Equal.”). But too many people do not take data seriously and continue to follow their gut feelings. Even if your customer profile coming out of a serious analysis does not match with your preconceived notions, do not blindly reject it; instead, treat it as a newly found gold mine. Gut feelings are even more overrated than Big Data.

3. Be logical: Illogical questions do not lead anywhere. There is no toolset that reads minds—at least not yet. Even if we get to have such amazing computers—as seen on “Star Trek” or in other science fiction movies—you would still have to ask questions in a logical fashion for them to be effective. I am not asking decision-makers to learn how to code (or be like Mr. Spock or his loyal follower, Dr. Sheldon Cooper), but to have some basic understanding of logical expressions and try to learn how analysts communicate with computers. This is not data geek vs. non-geek world anymore; we all have to be a little geekier. Knowing Boolean expressions may not be as cool as being able to throw a curve ball, but it is necessary to survive in the age of information overload.

4. Shoot for small successes: Start with a small proof of concept before fully investing in large data initiatives. Even with a small project, one gets to touch all necessary steps to finish the job. Understanding the flow of information is as important as each specific step, as most breakdowns occur in between steps, due to lack of proper connections. There was Gemini program before Apollo missions. Learn how to dock spaceships in space before plotting the chart to the moon. Often, over-investments are committed when the discussion is led by IT. Outsource even major components in the beginning, as the initial goal should be mastering the flow of things.

5. Be buyer-centric: No customer is bound by the channel of the marketer’s choice, and yet, may businesses act exactly that way. No one is an online person just because she did not refuse your email promotions yet (refer to “The Future of Online is Offline“). No buyer is just one dimensional. So get out of brand-, division-, product- or channel-centric mindsets. Even well-designed, buyer-centric marketing databases become ineffective if users are trapped in their channel- or division-centric attitudes, as in “These email promotions must flow!” or “I own this product line!” The more data we collect, the more chances marketers will gain to impress their customers and prospects. Do not waste those opportunities by imposing your own myopic views on them. Big Data movement is not there to fortify marketers’ bad habits. Thanks to the size of the data and speed of machines, we are now capable of disappointing a lot of people really fast.

What Did This Hype Change?
So, what did this Big Data hype change? First off, it changed people’s attitudes about the data. Some are no longer afraid of large amounts of information being thrown at them, and some actually started using them in their decision-making processes. Many realized that we are surrounded by numbers everywhere, not just in marketing, but also in politics, media, national security, health care and the criminal justice system.

Conversely, some people became more afraid—often with good reasons. But even more often, people react based on pure fear that their personal information is being actively exploited without their consent. While data geeks are rejoicing in the age of open source and cloud computing, many more are looking at this hype with deep suspicions, and they boldly reject storing any personal data in those obscure “clouds.” There are some people who don’t even sign up for EZ Pass and voluntarily stay on the long lane to pay tolls in the old, but untraceable way.

Nevertheless, not all is lost in this hype. The data got really big, and types of data that were previously unavailable, such as mobile and social data, became available to many marketers. Focus groups are now the size of Twitter followers of the company or a subject matter. The collection rate of POS (point of service) data has been increasingly steady, and some data players became virtuosi in using such fresh and abundant data to impress their customers (though some crossed that “creepy” line inadvertently). Different types of data are being used together now, and such merging activities will compound the predictive power even further. Analysts are dealing with less missing data, though no dataset would ever be totally complete. Developers in open source environments are now able to move really fast with new toolsets that would just run on any device. Simply, things that our forefathers of direct marketing used to take six months to complete can be done in few hours, and in the near future, maybe within a few seconds.

And that may be a good thing and a bad thing. If we do this right, without creating too many angry consumers and without burning holes in our budgets, we are currently in a position to achieve great many things in terms of predicting the future and making everyone’s lives a little more convenient. If we screw it up badly, we will end up creating lots of angry customers by abusing sensitive data and, at the same time, wasting a whole lot of investors’ money. Then this Big Data thing will go down in history as a great money-eating hype.

We should never do things just because we can; data is a powerful tool that can hurt real people. Do not even get into it if you don’t have a clear goal in terms of what to do with the data; it is not some piece of furniture that you buy just because your neighbor bought it. Living with data is a lifestyle change, and it requires a long-term commitment; it is not some fad that you try once and give up. It is a continuous loop where people’s responses to marketer’s data-based activities create even more data to be analyzed. And that is the only way it keeps getting better.

There Is No Big Data
And all that has nothing to do with “Big.” If done right, small data can do plenty. And in fact, most companies’ transaction data for the past few years would easily fit in an iPhone. It is about what to do with the data, and that goal must be set from a business point of view. This is not just a new playground for data geeks, who may care more for new hip technologies that sound cool in their little circle.

I recently went to Brazil to speak at a data conference called QIBRAS, and I was pleasantly surprised that the main theme of it was the quality of the data, not the size of the data. Well, at least somewhere in the world, people are approaching this whole thing without the “Big” hype. And if you look around, you will not find any successful data players calling this thing “Big Data.” They just deal with small and large data as part of their businesses. There is no buzzword, fanfare or a big banner there. Because when something is just part of your everyday business, you don’t even care what you call it. You just do. And to those masters of data, there is no Big Data. If Google all of a sudden starts calling itself a Big Data company, it would be so uncool, as that word would seriously limit it. Think about that.

‘Big Data’ Is Like Mining Gold for a Watch – Gold Can’t Tell Time

It is often quoted that 2.5 quintillion bytes of data are collected each day. That surely sounds like a big number, considering 1 quintillion bytes (or exabytes, if that sounds fancier) are equal to 1 billion gigabytes. … My phone can hold about 65 gigabytes; which, by the way, means nothing to me. I just know that figure equates to about 6,000 songs, plus all my personal information, with room to spare for hundreds of photos and videos. 

It is often quoted that 2.5 quintillion bytes of data are collected each day. That surely sounds like a big number, considering 1 quintillion bytes (or exabytes, if that sounds fancier) are equal to 1 billion gigabytes. Looking back only about 20 years, I remember my beloved 386-based desktop computer had a hard drive that can barely hold 300 megabytes, which was considered to be quite large in those ancient days. Now, my phone can hold about 65 gigabytes; which, by the way, means nothing to me. I just know that figure equates to about 6,000 songs, plus all my personal information, with room to spare for hundreds of photos and videos. So how do I fathom the size of 2.5 quintillion bytes? I don’t. I give up. I’d rather count the number stars in the universe. And I have been in the database business for more than 25 years.

But I don’t feel bad about that. If a pile of data requires a computer to process it, then it is already too “big” for our brains. In the age of “Big Data,” size matters, but emphasizing the size element is missing the point. People want to understand the data in their own terms and want to use them in decision-making processes. Throwing the raw data around to people without math or computing skills is like galleries handing out paint and brushes to people who want paintings on the wall. Worse yet, continuing to point out how “big” the Big Data world is to them is like quoting the number of rice grains on this planet in front of a hungry man, when he doesn’t even care how many grains are in one bowl. He just wants to eat a bowl of “cooked” rice, and right this moment.

To be a successful data player, one must be the master of the following three steps:

  • Collection;
  • Refinement; and
  • Delivery.

Collection and storage are obviously important in the age of Big Data. However, that in itself shouldn’t be the goal. I hear lots of bragging about how much data can be collected and stored, and how fast the data can be retrieved.

Great, you can retrieve any transaction detail going back 20 years in less than 0.5 seconds. Congratulations. But can you now tell me whom are more likely to be loyal customers for the next five years, with annual spending potential of more than $250? Or who is more likely to quit using the service in next 60 days? Who is more likely to be on a cruise ship leaving the dock on the East Coast heading for Europe between Thanksgiving and Christmas, with onboard spending potential greater than $300? Who is more likely to respond to emails with free shipping offers? Where should I open my next store selling fancy children’s products? What do my customers look like, and where do they go between 6 and 9 p.m.?

Answers to these types of questions do not come from the raw data, but they should be derived from the data through the data refinement process. And that is the hard part. Asking the right questions, expressing the goals in a mathematical format, throwing out data that don’t fit the question, merging data from a diverse array of sources, summarizing the data into meaningful levels, filling in the blanks (there will be plenty—even these days), and running statistical models to come up with scores that look like an answer to the question are all parts of the data refinement process. It is a lot like manufacturing gold watches, where mining gold is just an important first step. But a piece of gold won’t tell you what time it is.

The final step is to deliver that answer—which, by now, should be in a user-friendly format—to the user at the right time in the right format. Often, lots of data-related products only emphasize this part, as it is the most intimate one to the users. After all, it provides an illusion that the user is in total control, being able to touch the data so nicely displayed on the screen. Such tool sets may produce impressive-looking reports and dazzling graphics. But, lest we forget, they are only representations of the data refinement processes. In addition, no tool set will ever do the thinking part for anyone. I’ve seen so many missed opportunities where decision-makers invested obscene amounts of money in fancy tool sets, believing they will conduct all the logical and data refinement work for them, automatically. That is like believing that purchasing the top of the line Fender Stratocaster will guarantee that you will play like Eric Clapton in the near future. Yes, the tool sets are important as delivery mechanisms of refined data, but none of them replace the refinement part. Doing so would be like skipping guitar practice after spending $3,000 on a guitar.

Big Data business should be about providing answers to questions. It should be about humans who are the subjects of data collection and, in turn, the ultimate beneficiaries of information. It’s not about IT or tool sets that come and go like hit songs. But it should be about inserting advanced use of data into everyday decision-making processes by all kinds of people, not just the ones with statistics degrees. The goal of data players must be to make it simple—not bigger and more complex.

I boldly predict that missing these points will make “Big Data” a dirty word in the next three years. Emphasizing the size element alone will lead to unbalanced investments, which will then lead to disappointing results with not much to show for them in this cruel age of ROI. That is a sure way to kill the buzz. Not that I am that fond of the expression “Big Data”; though, I admit, one benefit has been that I don’t have to explain what I do for living for 10 minutes any more. Nonetheless, all the Big Data folks may need an exit plan if we are indeed heading for the days when it will be yet another disappointing buzzword. So let’s do this one right, and start thinking about refining the data first and foremost.

Collection and storage are just so last year.