Watch the Attitude, Data Geeks

One-dimensional techies will be replaced by machines in the near future. So what if they’re the smartest ones in the room? If decision-makers can’t use data, does the information really exist?

Data Geeks
Data geeks may be the smartest people in the room, but maybe not if decision-makers don’t know what to do with their information.

Data do not exist just for data geeks and nerds. All of these data activities are inevitably funded by people who want to harness business value out of data. Whether it is about increasing revenue or reducing cost, in the end, the data game is about creating tangible value in forms of dollars, pounds, Euros or Yuans.

It really has nothing to do with the coolness of the toolsets or latest technologies, but it is all about the business — plain and simple. In other words, the data and analytics field is not some playground reserved for math or technology geeks, who sometimes think that belonging to exclusive clubs with secret codes and languages is the goal in itself. At the risk of sounding like an unapologetic capitalist, data don’t flow if money stops flowing. If you doubt me, watch where the budgets get cut first when going gets rough.

Data and analytics folks may feel secure, as they may know something in which non-technical people may not be well-versed in the age of Big Data. Maybe their bosses leave techies alone in a corner, as technical details and math jargon give them headaches. Their jobs may indeed be secure, for as long as the financial value coming out of the unit is net positive. Others may tolerate some techie talk, condescending attitudes, or mathematical dramas, for as long as data and analytics help them monetarily. Otherwise? Buh-bye geeks!

I am writing this piece to provide a serious attitude adjustment to some data players. If data and analytics are not for geeks, but for the good of businesses (and all of the decision-makers who may not be technical), what does useful information look like?

Allow me to share some ideas for all the beneficiaries of data, not a selected few who speak the machine language.

  • Data Must Be in Forms That Are Easy to Understand without mathematical or technical expertise. It should be as simple and easy to understand as a weather report. That means all of the data and statistical modeling to fill in the gaps must be done before the information reaches the users.
  • Data Must Be Small, not mounds of unfiltered and unstructured information. Useful data must look like answers to questions, not something that comes with a 500-page data dictionary. Data players should never brag about the size of the data or speed of processing, as users really don’t care about such details.
  • Data Must Be Accurate. Inaccurate information is worse than not having any at all. Users also must remember that not everything that comes out of computers is automatically accurate. Conversely, data players must be responsible to fix all of the previous mistakes that were made to datasets before they even reached them. Not fair, but that’s the job.
  • Data Must Be Consistent. It can be argued that consistency is even more important than sheer accuracy. Often, being consistently off may be more desirable than having large fluctuations, as even a dead clock is completely accurate twice a day. This is especially true for information that is inferred via statistical work.
  • Data Must Be Applicable Most of the Time, not just for limited cases. Too many data are locked in silos serving myopic purposes. Data become more powerful when they are consolidated properly, reaching broader audiences.
  • Data Must Be Accessible to users through devices of their choices. Even good information that fits the above criteria becomes useless if it does not reach decision-makers when needed. Data players’ jobs are not done until data are delivered to the right people in the right format and a timely manner.

Who are these data players who should be responsible for all of this, and where do they belong? They may have titles such as Chief Data Officer (who would be in charge of data governance); Data Strategist or Analytics Strategist: Data Scientist; Statistical Analyst or Program Developer. They may belong to IT, marketing, or a separate data or analytics department. No matter. They must be translators of information for the benefit of users, speaking languages of both business and technology fluently. They should never be just guard dogs of information. Ultimately, they should represent the interests of business first, not waving some fictitious IT or data rules.

So-called specialists, who habitually spit out reasons why certain information must be locked away somewhere and why they should not be available to users in a more user-friendly form, must snap out of their technical, analytical or mathematical comfort zone, pronto.

Techies who are that one-dimensional will be replaced by a machine in the near future.

The future belongs to people who can connect dots among different worlds and paradigms, not to some geeks with limited imaginations and skill sets that could become obsolete soon.

So, if self-preservation is an instinct that techies possess, they should figure out who is paying the bills, including their salaries and benefits, and make it absolutely easy for these end-users in all ways listed here. If not for altruistic reasons, for their own benefit in this results-oriented business world.

If information is not used by decision-makers, does the information really exist?

Chicken or the Egg? Data or Analytics?

I just saw an online discussion about the role of a chief data officer, whether it should be more about data or analytics. My initial response to that question is “neither.” A chief data officer must represent the business first.

I just saw an online discussion about the role of a chief data officer, whether it should be more about data or analytics. My initial response to that question is “neither.” A chief data officer must represent the business first. And I had the same answer when such a title didn’t even exist and CTOs or other types of executives covered that role in data-rich environments. As soon as an executive with a seemingly technical title starts representing the technology, that business is doomed. (Unless, of course, the business itself is about having fun with the technology. How nice!)

Nonetheless, if I really have to pick just one out of the two choices, I would definitely pick the analytics over data, as that is the key to providing answers to business questions. Data and databases must be supporting that critical role of analytics, not the other way around. Unfortunately, many organizations are completely backward about it, where analysts are confined within the limitations of database structures and affiliated technologies, and the business owners and decision-makers are dictated to by the analysts and analytical tool sets. It should be the business first, then the analytics. And all databases—especially marketing databases—should be optimized for analytical activities.

In my previous columns, I talked about the importance of marketing databases and statistical modeling in the age of Big Data; not all depositories of information are necessarily marketing databases, and statistical modeling is the best way to harness marketing answers out of mounds of accumulated data. That begs for the next question: Is your marketing database model-ready?

When I talk about the benefits of statistical modeling in data-rich environments (refer to my previous column titled “Why Model?”), I often encounter folks who list reasons why they do not employ modeling as part of their normal marketing activities. If I may share a few examples here:

  • Target universe is too small: Depending on the industry, the prospect universe and customer base are sometimes very small in size, so one may decide to engage everyone in the target group. But do you know what to offer to each of your prospects? Customized offers should be based on some serious analytics.
  • Predictive data not available: This may have been true years back, but not in this day and age. Either there is a major failure in data collection, or collected data are too unstructured to yield any meaningful answers. Aren’t we living in the age of Big Data? Surely we should all dig deeper.
  • 1-to-1 marketing channels not in plan: As I repeatedly said in my previous columns, “every” channel is, or soon will be, a 1-to-1 channel. Every audience is secretly screaming, “Entertain us!” And customized customer engagement efforts should be based on modeling, segmentation and profiling.
  • Budget doesn’t allow modeling: If the budget is too tight, a marketer may opt in for some software solution instead of hiring a team of statisticians. Remember that cookie-cutter models out of software packages are still better than someone’s intuitive selection rules (i.e., someone’s “gut” feeling).
  • The whole modeling process is just too painful: Hmm, I hear you. The whole process could be long and difficult. Now, why do you think it is so painful?

Like a good doctor, a consultant should be able to identify root causes based on pain points. So let’s hear some complaints:

  • It is not easy to find “best” customers for targeting
  • Modelers are fixing data all the time
  • Models end up relying on a few popular variables, anyway
  • Analysts are asking for more data all the time
  • It takes too long to develop and implement models
  • There are serious inconsistencies when models are applied to the database
  • Results are disappointing
  • Etc., etc…

I often get called in when model-based marketing efforts yield disappointing results. More often than not, the opening statement in such meetings is that “The model did not work.” Really? What is interesting is that in more than nine times out of 10 cases like that, the models are the only elements that seem to have been done properly. Everything else—from pre-modeling steps, such as data hygiene, conversion, categorization, and summarization; to post-modeling steps, such as score application and validation—often turns out to be the root cause of all the troubles, resulting in pain points listed here.

When I speak at marketing conferences, talking about this subject of this “model-ready” environment, I always ask if there are statisticians and analysts in the audience. Then I ask what percentage of their time goes into non-statistical activities, such as data preparation and remedying data errors. The absolute majority of them say they spend of 80 percent to 90 percent of their time fixing the data, devoting the rest to the model development work. You don’t need me to tell you that something is terribly wrong with this picture. And I am pretty sure that none of those analysts got their PhDs and master’s degrees in statistics to spend most of their waking hours fixing the data. Yeah, I know from experience that, in this data business, the last guy who happens to touch the dataset always ends up being responsible for all errors made to the file thus far, but still. No wonder it is often quoted that one of the key elements of being a successful data scientist is the programming skill.

When you provide datasets filled with unstructured, incomplete and/or missing data, diligent analysts will devote their time to remedying the situation and making the best out of what they have received. I myself often tell newcomers that analytics is really about making the best of what you’ve got. The trouble is that such data preparation work calls for a different set of skills that have nothing to do with statistics or analytics, and most analysts are not that great at programming, nor are they trained for it.

Even if they were able to create a set of sensible variables to play with, here comes the bigger trouble; what they have just fixed is just a “sample” of the database, when the models must be applied to the whole thing later. Modern databases often contain hundreds of millions of records, and no analyst in his or her right mind uses the whole base to develop any models. Even if the sample is as large as a few million records (an overkill, for sure) that would hardly be the entire picture. The real trouble is that no model is useful unless the resultant model scores are available on every record in the database. It is one thing to fix a sample of a few hundred thousand records. Now try to apply that model algorithm to 200 million entries. You see all those interesting variables that analysts created and fixed in the sample universe? All that should be redone in the real database with hundreds of millions of lines.

Sure, it is not impossible to include all the instructions of variable conversion, reformat, edit and summarization in the model-scoring program. But such a practice is the No. 1 cause of errors, inconsistencies and serious delays. Yes, it is not impossible to steer a car with your knees while texting with your hands, but I wouldn’t call that the best practice.

That is why marketing databases must be model-ready, where sampling and scoring become a routine with minimal data transformation. When I design a marketing database, I always put the analysts on top of the user list. Sure, non-statistical types will still be able to run queries and reports out of it, but those activities should be secondary as they are lower-level functions (i.e., simpler and easier) compared to being “model-ready.”

Here is list of prerequisites of being model-ready (which will be explained in detail in my future columns):

  • All tables linked or merged properly and consistently
  • Data summarized to consistent levels such as individuals, households, email entries or products (depending on the ranking priority by the users)
  • All numeric fields standardized, where missing data and zero values are separated
  • All categorical data edited and categorized according to preset business rules
  • Missing data imputed by standardized set of rules
  • All external data variables appended properly

Basically, the whole database should be as pristine as the sample datasets that analysts play with. That way, sampling should take only a few seconds, and applying the resultant model algorithms to the whole base would simply be the computer’s job, not some nerve-wrecking, nail-biting, all-night baby-sitting suspense for every update cycle.

In my co-op database days, we designed and implemented the core database with this model-ready philosophy, where all samples were presented to the analysts on silver platters, with absolutely no need for fixing the data any further. Analysts devoted their time to pondering target definitions and statistical methodologies. This way, each analyst was able to build about eight to 10 “custom” models—not cookie-cutter models—per “day,” and all models were applied to the entire database with more than 200 million individuals at the end of each day (I hear that they are even more efficient these days). Now, for the folks who are accustomed to 30-day model implementation cycle (I’ve seen as long as 6-month cycles), this may sound like a total science fiction. And I am not even saying that all companies need to build and implement that many models every day, as that would hardly be a core business for them, anyway.

In any case, this type of practice has been in use way before the words “Big Data” were even uttered by anyone, and I would say that such discipline is required even more desperately now. Everyone is screaming for immediate answers for their questions, and the questions should be answered in forms of model scores, which are the most effective and concise summations of all available data. This so-called “in-database” modeling and scoring practice starts with “model-ready” database structure. In the upcoming issues, I will share the detailed ways to get there.

So, here is the answer for the chicken-or-the-egg question. It is the business posing the questions first and foremost, then the analytics providing answers to those questions, where databases are optimized to support such analytical activities including predictive modeling. For the chicken example, with the ultimate goal of all living creatures being procreation of their species, I’d say eggs are just a means to that end. Therefore, for a business-minded chicken, yeah, definitely the chicken before the egg. Not that I’ve seen too many logical chickens.