Data Analytics Projects Only Benefit Marketers When Properly Applied

A recent report shared that only about 20% of all analytics projects work turns out to be beneficial to businesses. Such waste. Nonetheless, is that solely the fault of data scientists? After all, even effective medicine renders useless if the patient refuses to take it.

I recently read a report that only about 20% of all analytics projects work turns out to be beneficial to businesses. Such waste. Nonetheless, is that solely the fault of data scientists? After all, even effective medicine renders useless if the patient refuses to take it.

Then again, why would users reject the results of analytics work? At the risk of gross simplification, allow me to break it down into two categories: Cases where project goals do not align with the business goals, and others where good intelligence gets wasted due to lack of capability, procedure, or will to implement follow-up actions. Basically, poor planning in the beginning, and poor execution at the backend.

Results of analytics projects often get ignored if the project goal doesn’t serve the general strategy or specific needs of the business. To put it in a different way, projects stemming from the analyst’s intellectual curiosity may or may not align with business interests. Some math geek may be fascinated by the elegance of mathematical precision or complexity of solutions, but such intrigue rarely translates directly into monetization of data assets.

In business, faster and simpler answers are far more actionable and valuable. If I ask business people if they want an answer with 80% confidence level in next 2 days, or an answer with 95% certainty in 4 weeks, the great majority would choose the quicker but less-than-perfect answer. Why? Because the keyword in all this is “actionable,” not “certainty.”

Analysts who would like to maintain a distance from immediate business needs should instead pursue pure science in the world of academia (a noble cause, without a doubt). In business settings, however, we play with data only to make tangible differences, as in dollars, cents, minutes or seconds. Once such differences in philosophy are accepted and understood by all involved parties, then the real question is: What kind of answers are most needed to improve business results?

Setting Analytics Projects Up for Success

Defining the problem statement is the hardest part for many analysts. Even the ones who are well-trained often struggle with the goal setting process. Why? Because in school, the professor in charge provides the problems to solve, and students submit solutions to them.

In business, analysts must understand the intentions of decision makers (i.e., their clients), deciphering not-so-logical general statements and anecdotes. Yeah, sure, we need to attract more high-value customers, but how would we express such value via mathematical statements? What would the end result look like, and how will it be deployed to make any difference in the end?

If unchecked, many analytics projects move forward purely based on the analysts’ assumptions, or worse, procedural convenience factors. For example, if the goal of the project is to rank a customer list in the order of responsiveness to certain product offers, then to build models like that, one may employ all kinds of transactional, behavioral, response, and demographic data.

All these data types come with different strengths and weaknesses, and even different missing data ratios. In cases like this, I’ve encountered many — too many — analysts who would just omit the whole population with missing demographic data in the development universe. Sometimes such omission adds up to be over 30% of the whole. What, are we never going to reach out to those souls just because they lack some peripheral data points for them?

Good luck convincing the stakeholders who want to use the entire list for various channel promotions. “Sorry, we can provide model scores for only 70% of your valuable list,” is not going to cut it.

More than a few times, I received questions about what analysts should do when they have to reach deep into lower model groups (of response models) to meet the demand of marketers, knowing that the bottom half won’t perform well. My response would be to forget about the model — no matter how elegant it may be — and develop heuristic rules to eliminate obvious non-targets in the prospect universe. If the model gets to be used, it is almost certain that the modeler in charge will be blamed for mediocre or bad performance, anyway.

Then I firmly warn them to ask about typical campaign size “before” one starts building some fancy models. What is the point of building a response model when the emailer would blast emails as much as he wants? To prove that the analyst is well-versed in building complex response models? What difference would it ever make in the “real” world? With that energy, it would be far more prudent to build a series of personas and product affinity models to personalize messages and offers.

Supporting Analytics Results With Marketing

Now, let’s pause for a moment and think about the second major reason why the results of analytics are not utilized. Assume that the analytics team developed a series of personas and product affinity models to customize offers on a personal level. Does the marketing team have the ability to display different offers to different targets? Via email, websites, and/or print media? In other words, do they have capabilities and resources to show “a picture of two wine glasses filled with attractive looking red wine” to people who scored high scores in the “Wine Enthusiast” model?

I’ve encountered too many situations where marketers look concerned — rather than getting excited — when talking about personas for personalization. Not because they care about what analysts must go through to produce a series of models, but because they lack creative assets and technical capabilities to make it all happen.

They often complain about lack of budget to develop multiple versions of creatives, lack of proper digital asset management tools, lack of campaign management tools that allows complex versioning, lack of ability to serve dynamic contents on websites, etc. There is no shortage of reasons why something “cannot” be done.

But, even in a situation like that, it is not the job of a data scientist to suggest increasing investments in various areas, especially when “other” departments have to cough up the money. No one gets to command unlimited resources, and every department has its own priorities. What analytics professionals must do is to figure out all kinds of limitations beyond the little world of analytics, and prioritize the work in terms of actionability.

Consider what can be done with minimal changes in the marketing ecosystem, and for preservation of analytics and marketing departments, what efforts will immediately bring tangible results? Basically, what will we be able to brag about in front of CEOs and CFOs?

When to Put Analytics Projects First

Prioritization of analytics projects should never be done solely based on data availability, ease of data crunching or modeling, or “geek” factors. It should be done in terms of potential value of the result, immediate actionability, and most importantly, alignment with overall business objectives.

The fact that only about 20% of analytics work yields business value means that 80% of the work was never even necessary. Sure, data geeks deserve to have some fun once in a while, but the fun factor doesn’t pay for the systems, toolsets, data maintenance, and salaries.

Without proper problem statements on the front-end and follow-up actions on the back-end, no amount of analytical activities would produce any value for businesses. That is why data and analytics professionals must act as translators between the business world and the technical world. Without that critical consulting layer, it becomes the-luck-of-the-draw when prioritizing projects.

To stay on target, always start with a proper analytics roadmap covering from ideation to applications stages. To be valued and appreciated, data scientists must act as business consultants, as well.

 

Stop Expecting Data Scientists to Be Magical: Analytics Is a Team Sport

Many organizations put unreasonable expectations on data scientists. Their job descriptions and requirements are often at a super-human level. “They” say — and who are they? — that modern-day data scientists must be good at absolutely everything. Okay, then, what’s “everything,” in this case?

Many organizations put unreasonable expectations on data scientists. Their job descriptions and requirements are often at a super-human level. “They” say — and who are they? — that modern-day data scientists must be good at absolutely everything. Okay, then, what’s “everything,” in this case?

First, data scientists have to have a deep understanding in mathematics and statistics, covering regression models, machine learning, decision trees, clustering, forecasting, optimization, etc. Basically, if you don’t have a post-graduate degree in statistics, you will fail at “hello.” The really bad news is that even people with statistics degrees are not well-versed in every technique and subject matter. They all have their specialties, like medical doctors.

Then data scientists have to have advanced programming skills and deep knowledge in database technologies. They must be fluent in multiple computer languages in any setting, easily handling all types of structured and unstructured databases and files in any condition. This alone is a full-time job, requiring expert-level experience, as most databases are NOT in analytics-ready form. It is routinely quoted that most data scientists spend over 80% of their time fixing the data. I am certain that these folks didn’t get an advanced degree in statistics to do data plumbing and hygiene work all of the time. But that is how it is, as they won’t see what we call a “perfect” dataset outside schools.

Data scientists also have to have excellent communication and data visualization skills, being able to explain complex ideas in plain English. It is hard enough to derive useful insights out of mounds of data; now they have to construct interesting stories out of them, filled with exciting punchlines and actionable recommendations at the end. Because most mortals don’t understand technical texts and numbers very well — many don’t even try, and some openly say they don’t want to think — data scientists must develop eye-popping charts and graphs, as well, using the popular visualization tool du jour. (Whatever that tool is, they’d better learn it fast).

Finally, to construct the “right” data strategies and solutions for the business in question, the data scientist should have really deep domain and industry knowledge, at a level of a management and/or marketing consultant. On top of all of that, most job requirements also mention soft skills — as “they” don’t want some data geeks with nerdy attitudes. In other words, data scientists must come with kind and gentle bedside manners, while being passionate about the business and boring stuff like mathematics. Some even ask for child-like curiosity and ability to learn things extremely fast. At the same time, they must carry authority like a professor, being able to influence non-believers and evangelize the mind-numbing subject of analytics. This last part about business acumen, by the way, is the single-most important factor that divides excellent data scientists who add value every time they touch data, and data plumbers who just move data around all day long. It is all about being able to give the right type of homework to themselves.

Now, let me ask you: Do you know anyone like this, having all of these skills and qualities in “one” body? If you do, how many of them do you personally know? I am asking this question in the sincerest manner (though I am quite sarcastic, by nature), as I keep hearing that we need tens of thousands of such data scientists, right now.

There are musicians who can write music and lyrics, determine the musical direction as a producer, arrange the music, play all necessary instruments, sing the song, record, mix and master it, publish it, and promote the product, all by themselves. It is not impossible to find such talents. But if you insist that only such geniuses can enter the field of music, there won’t be much music to listen to. The data business is the same way.

So, how do we divide the task up? I have been using this three-way division of labor — as created by my predecessors — for a long time, as it has been working very well in any circumstance:

  • A Statistical Analyst will have deep knowledge in statistical modeling and machine learning. They would be at the core of what we casually call analytics, which goes way beyond some rule-based decision-making. But these smart people need help.
  • A Master Data Manipulator will have excellent coding skills. These folks will provide analytics-ready datasets on silver platters for the analysts. They will essentially take care of all of the “before” and “after” steps around statistical modeling and other advanced analytics. It is important to remember that most projects go wrong in data preparation and post-analytics application stages.
  • A Business Analyst will need to have a deep understanding of business challenges and the industry landscape, as well as functional knowledge in modeling and database technologies. These are the folks who will prescribe solutions to business challenges, create tangible projects out of vague requests, evaluate data sources and data quality, develop model specifications, apply the results to businesses, and present all of this in the form of stories, reports, and data visualization.

Now, achieving master-level expertise in one of these areas is really difficult. People who are great in two of these three areas are indeed rare, and they will already have “chief” or “head” titles somewhere, or have their own analytics practices. If you insist only procuring data scientists who are great at everything? Good luck to you.

Too many organizations that are trying to jump onto this data bandwagon hire just one or two data scientists, dump all kinds of unorganized and unstructured data on them, and ask them to produce something of value, all on their own. Figuring out what type of data or analytics activity will bring monetary value to the organization isn’t a simple task. Many math geeks won’t be able to jump that first hurdle by themselves. Most business goals are not in the form of logical expressions, and the majority of data they will encounter in that analytics journey won’t be ready for analytics, either.

Then again, strategic consultants who develop a data and analytics roadmap may not be well-versed in actual modeling, machine learning implementation, or database constructs. But such strategists should operate on a different plane, by design. Evaluating them based on coding or math skills would be like judging an architect based on his handling of building materials. Should they be aware of values and limitations of data-related technologies and toolsets? Absolutely. But that is not the same as being hands-on, at a professional level, in every area.

Analytics has always been a team sport. It was like that when the datasets were smaller and the computers were much slower, and it is like that when databases are indeed huge and computing speed is lightning fast. What remains constant is that, in data play, someone must see through the business goals and data assets around them to find the best way to create business value. In executing such plans, they will inevitably encounter many technical challenges and, of course, they will need expert-level technicians to plow through data firsthand.

Like any creative work, such as music producing or movie-making, data and analytics work must start with a vision, tangible business goals, and project specifications. If these elements are misaligned, no amount of mathematical genius will save the day. Even the best rifles will be useless if the target is hung in a wrong place.

Technical aspects of the work matter only when all stakeholders share the idea of what the project is all about. Simple statements like “maximizing the customer value” need a translation by a person who knows both business and technology, as the value can be expressed in dollars, visits, transactions, dates, intervals, status, and any combination of these variables. These seemingly simple decisions must be methodically made with a clear purpose, as a few wrong assumptions by the analyst at-hand — who may have never met the end-user — can easily derail the project toward a wrong direction.

Yes, there are people who can absolutely see through everything and singlehandedly take care of them all. But if your business plan requires such superheroes and nothing but such people, you must first examine your team development roadmap, org chart, and job descriptions. Keep on pushing those poor and unfortunate recruiters who must find unicorns within your budget won’t get you anywhere; that is not how you’re supposed to play this data game in the first place.

When You Fail, Don’t Blame Data Scientists First — or Models

The first step in analytics should be “formulating a question,” not data-crunching. I can even argue formulating the question is so difficult and critical, that it is the deciding factor dividing analysts into seasoned data scientists and junior number-crunchers.

Last month, I talked about ways marketing automation projects go south (refer to “Why Many Marketing Automation Projects Go South”). This time, let’s be more specific about modeling, which is an essential element in converting mounds of data into actionable solutions to challenges.

Without modeling, all automation efforts would remain at the level of rudimentary rules. And that is one of the fastest routes to automate wrong processes, leading to disappointing results in the name of marketing automation.

Nonetheless, when statistically sound models are employed, users to tend to blame the models first when the results are less than satisfactory. As a consultant, I often get called in when clients suspect the model performance. More often than not, however, I find that the model in question was the only thing that was done correctly in a series of long processes from data manipulation and target setting to model scoring and deployment. I guess it is just easier to blame some black box, but most errors happen before and after modeling.

A model is nothing but an algorithmic expression measuring likelihood of an object resembling — or not resembling — the target. As in, “I don’t know for sure, but that household is very likely to purchase high-end home electronics products,” only based on the information that we get to have. Or on a larger scale, “How many top-line TV sets over 65 inches will we sell during the Christmas shopping season this year?” Again, only based on past sales history, current marcom spending, some campaign results, and a few other factors — like seasonality and virality rate.

These are made-up examples, of course, but I tried to make them as specific and realistic as possible here. Because when people think that a model went wrong, often it is because a wrong question was asked in the first place. Those “dumb” algorithms, unfortunately, only provide answers to specific questions. If a wrong question is presented? The result would seem off, too.

That is why the first step in analytics should be “formulating a question,” not data-crunching. Jumping into a data lake — or any other form of data depository, for that matter — without a clear definition of goals and specific targets is often a shortcut to demise of the initiative itself. Imagine a case where one starts building a house without a blueprint. Just as a house is not a random pile of building materials, a model is not an arbitrary combination raw data.

I can even argue formulating the question is so difficult and critical, that it is the deciding factor dividing analysts into seasoned data scientists and junior number-crunchers. Defining proper problem statements is challenging, because:

  • business goals are often far from perfectly constructed logical statements, and
  • available data are mostly likely incomplete or inadequate for advanced analytics.

Basically, good data players must be able to translate all those wishful marketing goals into mathematical expressions, only using the data handed to them. Such skill is far beyond knowledge in regression models or machine learning.

That is why we must follow these specific steps for data-based solutioning:

data scientists use this roadmap
Credit: Stephen H. Yu
  1. Formulating Questions: Again, this is the most critical step of all. What are the immediate issues and pain points? For what type of marketing functions, and in what context? How will the solution be applied and how will they be used by whom, through what channel? What are the specific domains where the solution is needed? I will share more details on how to ask these questions later in this series, but having a specific set of goals must be the first step. Without proper goal-setting, one can’t even define success criteria against which the results would be measured.
  2. Data Discovery: It is useless to dream up a solution with data that are not even available. So, what is available, and what kind of shape are they in? Check the inventory of transaction history; third-party data, such as demographic and geo-demographic data; campaign history and response data (often not in one place); user interaction data; survey data; marcom spending and budget; product information, etc. Now, dig through everything, but don’t waste time trying to salvage everything, either. Depending on the goal, some data may not even be necessary. Too many projects get stuck right here, not moving forward an inch. The goal isn’t having a perfect data depository — CDP, Data Lake, or whatever — but providing answers to questions posed in Step 1.
  3. Data Transformation: You will find that most data sources are NOT “analytics-ready,” no matter how clean and organized they may seem (there are often NOT well-organized, either). Disparate data sources must be merged and consolidated, inconsistent data must be standardized and categorized, different levels of information must be summarized onto the level of prediction (e.g., product, email, individual, or household levels), and intelligent predictors must be methodically created. Otherwise, the modelers would spend majority of their time fixing and massaging the data. I often call this step creating an “Analytics Sandbox,” where all “necessary” data are in pristine condition, ready for any type of advanced analytics.
  4. Analytics/Model Development: This is where algorithms are created, considering all available data. This is the highlight of this analytics journey, and key to proper marketing automation. Ironically, this is the easiest part to automate, in comparison to previous steps and post-analytics steps. But only if the right questions — and right targets — are clearly defined, and data are ready for this critical step. This is why one shouldn’t just blame the models or modelers when the results aren’t good enough. There is no magic algorithm that can save ill-defined goals and unusable messy data.
  5. Knowledge Share: The models may be built, but the game isn’t over yet. It is one thing to develop algorithms with a few hundred thousand record samples, and it’s quite another to apply them to millions of live data records. There are many things that can go wrong here. Even slight differences in data values, categorization rules, or even missing data ratio will make well-developed models render ineffective. There are good reasons why many vendors charge high prices for model scoring. Once the scoring is done and proven correct, resultant model scores must be shared with all relevant systems, through which decisions are made and campaigns are deployed.
  6. Application of Insights: Just because model scores are available, it doesn’t mean that decision-makers and campaign managers will use them. They may not even know that such things are available to them; or, even if they do, they may not know how to use them. For instance, let’s say that there is a score for “likely to respond to emails with no discount offer” (to weed out habitual bargain-seekers) for millions of individuals. What do those scores mean? The lower the better, or the higher the better? If 10 is the best score, is seven good enough? What if we need to mail to the whole universe? Can we differentiate offers, depending on other model scores — such as, “likely to respond to free-shipping offers”? Do we even have enough creative materials to do something like that? Without proper applications, no amount of mathematical work will seem useful. This is why someone in charge of data and analytics must serve as an “evangelist of analytics,” continually educating and convincing the end-users.
  7. Impact Analysis: Now, one must ask the ultimate question, “Did it work?” And “If it did, what elements worked (and didn’t work)?” Like all scientific approaches, marketing analytics and applications are about small successes and improvements, with continual hypothesizing and learning from past trials and mistakes. I’m sure you remember the age-old term “Closed-loop” marketing. All data and analytics solutions must be seen as continuous efforts, not some one-off thing that you try once or twice and forget about. No solution will just double your revenue overnight; that is more like a wishful thinking than a data-based solution.

As you can see, there are many “before” and “after” steps around modeling and algorithmic solutioning. This is why one should not just blame the data scientist when things don’t work out as expected, and why even casual users must be aware of basic ins and outs of analytics. Users must understand that they should not employ models or solutions outside of their original design specifications, either. There simply is no way to provide answers to illogical questions, now or in the future.

Machine Learning? I Don’t Think Those Words Mean What You Think They Mean

I find more and more people use the term “machine learning” when they really mean to say “modeling.” I guess that is like calling all types of data activities — with big and small data — “Big Data.” And that’s OK.

I find more and more people use the term “machine learning” when they really mean to say “modeling.” I guess that is like calling all types of data activities — with big and small data — “Big Data.” And that’s OK.

Languages are developed to communicate with other human beings more effectively. If most people use the term to include broader meanings than the myopic definition of the words in question, and if there is no trouble understanding each other that way, who cares? I’m not here to defend the purity of the meaning, but to monetize big or small data assets.

The term “Big Data” is not even a thing in most organizations with ample amounts of data anymore, but there are many exceptions, too. I visit other countries for data and analytics consulting, and those two words still work like “open sesame” to some boardrooms. Why would I blame words for having multiple meanings? The English dictionary is filled with such colloquial examples.

I recently learned that famous magic words “Hocus Pocus” came from the Latin phrase “hoc est corpus,” which means “This is the body (of Christ)” as spoken during Holy Communion in Roman Catholic Churches. So much for the olden-day priests only speaking in Latin to sound holier; ordinary people understood the process as magic — turning a piece of bread into the body of Christ — and started applying the phrase to all kinds of magic tricks.

However, if such transformations of words start causing confusion, we all need to be more specific. Especially when the words are about specific technical procedures (not magic). Going back to my opening statement, what does “machine learning” mean to you?

  • If spoken among data scientists, I guess that could mean a very specific way to describe modeling techniques that include Supervised Learning, Unsupervised Learning, Reinforced Learning, Deep Learning, or any other types of Neural Net modeling, indicating specific methods to construct models that serve predetermined purposes.
  • If used by decision-makers, I think it could mean that the speaker wants minimal involvement of data scientists or modelers in the end, and automate the model development process as much as possible. As in “Let’s set up Machine Learning to classify all the inbound calls into manageable categories of inquiries,” for instance. In that case, the key point would be “automation.”
  • If used by marketing or sales; well, now, we are talking about really broad set of meanings. It could mean that the buyers of the service will require minimal human intervention to achieve goals. That the buyer doesn’t even have to think too much (as the toolset would just work). Or, it could mean that it will run faster than existing ways of modeling (or pattern recognition). Or, they meant to say “modeling,” but they somehow thought that it sounded antiquated. Or, it could just mean that “I don’t even know why I said Machine Learning, but I said it because everyone else is saying it” (refer to “Why Buzzwords Suck”).

I recently interviewed a candidate fresh out of a PhD program for a data scientist position, whose resume is filled with “Machine Learning.” But when we dug a little deeper into actual projects he finished for school work or internship programs, I found out that most of his models were indeed good, old regression models. So I asked why he substituted words like that, and his answer was staggering; he said his graduate school guided him that way.

Why Marketers Need to Know What Words Mean

Now, I’m not even sure whom to blame in a situation like this, where even academia has fallen under the weight of buzzwords. After all, the schools are just trying to help their students getting high paying jobs before the summer is over. I guess then the blame is on the hiring managers who are trying to recruit candidates based on buzzwords, not necessarily knowing what they should look for in the candidates.

And that is a big problem. This is why even non-technical people must understand basic meanings of technical terms that they are using; especially when they are hiring employees or procuring outsourcing vendors to perform specific tasks. Otherwise, some poor souls would spend countless hours to finish things that don’t mean anything for the bottom-line. In a capitalistic economy, we play with data for only two reasons:

  1. to increase revenue, or
  2. to reduce cost.

If it’s all the same for the bottom line, why should a non-technician care about the “how the job is done” part?

Why It Sucks When Marketers Demand What They Don’t Understand

I’ve been saying that marketers or decision-makers should not be bad patients. Bad patients won’t listen to doctors; and further, they will actually command doctors prescribe certain medications without testing or validation. I guess that is one way to kill themselves, but what about the poor, unfortunate doctor?

We see that in the data and analytics business all of the time. I met a client who just wanted to have our team build neural net models for him. Why? Why not insist on a random forest method? I think he thought that “neural net” sounded cool. But when I heard his “business” problems out, he definitely needed something different as a solution. He didn’t have the data infrastructure to support any automated solutions; he wanted to know what went on in the modeling process (neural net models are black boxes, by definition), he didn’t have enough data to implement such things at the beginning stage, and projected gains (by employing models) wouldn’t cover the cost of such implementation for the first couple of years.

What he needed was a short-term proof of concept, where data structure must be changed to be more “analytics-ready.” (It was far from it.) And the models should be built by human analysts, so that everyone would learn more about the data and methodology along the way.

Imagine a junior analyst fresh out of school, whose resume is filled with buzzwords, meeting with a client like that. He wouldn’t fight back, but would take the order verbatim and build neural net models, whether they helped in achieving the business goals or not. Then the procurer of the service would still be blaming the concept of machine learning itself. Because bad patients will never blame themselves.

Even advanced data scientists sometimes lose the battle with clients who insist on implementing Machine Learning when the solution is something else. And such clients are generally the ones who want to know every little detail, including how the models are constructed. I’ve seen data scientists who’d implemented machine learning algorithms (for practical reasons, such as automation and speed gain), and reverse-engineered the models, using traditional regression techniques, only to showcase what variables were driving the results.

One can say that such is the virtue of a senior-level data scientist. But then what if the analyst is very green? Actually some decision-makers may like that, as a more junior-level person won’t fight back too hard. Only after a project goes south, those “order takers” will be blamed (as in “those analysts didn’t know what they were doing”).

Conclusion

Data and analytics businesses will continually evolve, but the math and the human factors won’t change much. What will change, however, is that we will have fewer and fewer middlemen between the decision-makers (who are not necessarily well-versed in data and analytics) and human analysts or machines (who are not necessarily well-versed in sales or marketing). And it will all be in the name of automation, or more specifically, Machine Learning or AI.

In that future, the person who orders the machine around — ready or not — will be responsible for bad results and ineffective implementations. That means, everyone needs to be more logical. Maybe not as much as a Vulcan, but somewhere between a hardcore coder and a touchy-feely marketer. And they must be more aware of capabilities and limitations of technologies and techniques; and, more importantly, they should not blindly trust machine-based solutions.

The scary part is that those who say things like “Just automate the whole thing with AI, somehow” will be the first in line to be replaced by the machines. That future is not far away.

Data Geeks Must Learn to Speak to Clients

This piece is for aspiring data scientists, analysts or consultants (or any other cool title du jour in this data and analytics business). Then again, people who spend even a single dime on a data project must remember this, as well: “The main goal of any analytical endeavor is to make differences in business.”

This piece is for aspiring data scientists, analysts or consultants (or any other cool title du jour in this data and analytics business). Then again, people who spend even a single dime on a data project must remember this, as well: “The main goal of any analytical endeavor is to make differences in business.”

To this, some may say “Duh, keep stating the obvious.” But I am stating the obvious, as too many data initiatives are either for the sake of playing with data at hand, or for the “cool factor” among fellow data geeks. One may sustain such a position for a couple of years if he is lucky, but sooner or later, someone who is paying for all of the data stuff will ask where the money is going. In short, no one will pay for all of those servers, analytical tools and analysts’ salaries so that a bunch of geeks have some fun with data. If you just want the fun part, then maybe you should just stay in academia “paying” tuition for such an experience.

Not too long ago, I encountered a promising resume in a deep pile. Seemingly, this candidate had very impressive credentials. A PhD in statistics from a reputable school, hands-on analytics experience in multiple industries (so he claimed), knowledge in multiple types of statistical techniques, and proficiency in various computing languages and toolsets. But the interview couldn’t have gone worse.

When the candidate was going on and on about minute details of his mathematical journey for a rather ordinary modeling project, I interrupted and asked a very simple question: “Why did you build that model?” Unbelievably, he couldn’t answer that question, and kept resorting back to the methodology part. Unfortunately for him, I was not looking for a statistician, but an analytics consultant. There was just no way that I would put such a mechanical person in front of a client without risking losing the deal entirely.

When I interview to fill a client-facing position, I am not just looking for technical skills. What I am really looking for is an ability to break down business challenges into tangible analytics projects to meet tangible business goals.

In fact, in the near future, this will be all that is left for us humans to do: “To define the problem statement in the business context.” Machines will do all of the tedious data prep work and mathematical crunching after that. (Well, with some guidance from humans, but not requiring line-by-line instructions by many.) Now, if number-crunching is the only skill one is selling, well then, he is asking to be replaced by machines sooner than others.

From my experience, I see that the overlap between a business analyst and a statistical analyst is surprisingly small. Further, let me go on and say that most graduates with degrees in statistics are utterly ill-prepared for the real world challenges. Why?

Once I read an article somewhere (I do not recall the name of the publication or the author) that colleges are not really helping future data scientists in a practical manner, as (

  1. all of the datasets for school projects are completely clean and free of missing data, and
  2. professors set the goals and targets of modeling exercises.

I completely agree with this statement, as I have never seen a totally clean dataset since my school days (which was a long time ago in a galaxy far far away), and defining the target of any model is the most difficult challenge in any modeling project. In fact, for most hands-on analysts, data preparation and target definition are the work. If the target is hung on a wrong place, no amount of cool algorithms will save the day.

Yet, kids graduate schools thinking that they are ready to take on such challenges in the real world on Day One. Sorry to break it to them this way, but no, mathematical skills do not directly translate into ability to solve problems in the business world. Such training will definitely give them an upper hand in the job market, though, as no math-illiterate should be called an analyst.

Last summer, my team hired two promising interns, mainly to build a talent pool for the following year. Both were very bright kids, indeed, and we gave them two seemingly straightforward modeling projects. The first assignment was to build a model to proximate customer loyalty in a B2B setting. I don’t remember the second assignment, as they spent the entire summer searching for the definition of a “loyal customer” to go after. They couldn’t even begin the modeling part. So more senior members in the team had to do that fun part after they went back to school. (For more details about this project, refer to “The Secret Sauce for B2B Loyalty Marketing.”)

Of course, we as a team knew what we were doing all along, but I wanted to teach these youngsters how to approach a project from the very beginning, as no client will define the target for consultants and vendors. Technical specs? You’re supposed to write that spec from scratch.

In fact, determining if we even need a model to reach the business goal was a test in itself. Why build a model at all? Because it’s a cool thing on your resume? With what data? For what specific success metrics? If “selling more things by treating valuable customers properly” is the goal, then why not build a customer value model first? Why the loyalty model? Because clients just said so? Why not product propensity models, if there are specific products to push? Why not build multiple models and cover all bases while we’re at it? If so, will we build a one-size-fits-all model in one shot, or should we consider separating the universe for distinct segments in the footprint? If so, how would you determine such segments then? (Ah, that “segmentation of the universe” part was where the interns were stuck.)

Boy, did I wish schools spent more time doing these types of problem-solving exercises with their students. Yes, kids will be uncomfortable as these questions do NOT have clear yes or no answers to them. But in business, there rarely are clear answers to our questions. Converting such ambiguity into measurable and quantifiable answers (such as probability that a certain customer will respond to a certain offer, or sales projection of a particular product line for the next two quarters with limited data) is the required skill. Prescribing the right approach and methodology to solve long- and short-term challenges is the job, not just manipulating data and building algorithms.

In other words, mathematical elegance may be a differentiating factor between a mediocre and excellent analyst, but such is not the end goal. Then what should aspiring analysts keep in mind?

In the business world, the goals of data or analytical work are really clear-cut and simple. We work with the data to (1) increase revenue, (2) decrease cost (hence, maximizing profit), or minimize risks. That’s it.

From that point, a good analyst should:

  • Define clear problem statements (even when ambiguity is all around)
  • Set tangible and attainable goals employing a phased approach (i.e., a series of small successes leading to achievement of long-term goals)
  • Examine quality of available data, and get them ready for advanced analytics (as most datasets are NOT model-ready)
  • Consider specific methodologies best fit to solve goals in each phase (as assumptions and conditions may change drastically for each progression, and one brute-force methodology may not work well in the end)
  • Set the order of operation (as sequence of events does matter in any complex project)
  • Determine success metrics, and think about how to “sell” the results to sponsors of the project (even before any data or math work begins)
  • Go about modeling or any other statistical work (only if the project calls for it)
  • Share knowledge with others and make sure resultant model scores and other findings are available to users through their favorite toolsets (even if the users are non-believers of analytics)
  • Continuously monitor the results and re-fit the models for improvement

As you can see here, even in this simplified list, modeling is just an “optional” step in the whole process. No one should build models because they know how to do it. You’re not in school anymore, where the goal is to get an A at the end of the semester. In the real world (although using this term makes me sound like a geezer), data players are supposed to make money with data, with or without advanced techniques. Methodologies? They are just colors on a palette, and you don’t have to use all of them.

For the folks who are in position to hire math geeks to maximize the value of data, simply ask them “why they would do anything.” If the candidate actually pauses and tries to think from the business perspective, then she is actually displaying some potential to be a business partner in the future. If the candidate keeps dropping technical jargon to this simple question, cut the interview short — unless you have natural curiosity in the mechanics of models and analytics, and your department’s success is just measured in complexity and elegance of solutions. But I highly doubt that such a goal would be above increasing profit for the organization in the end.

Machine Learning: More Common Than You Think

There’s a lot of buzz lately about machine learning. In many ways, it’s transforming the consumer experience and improving the products and operations of many companies. Plus, it’s not just for data analysts — machine learning has real benefits in the lives of the average consumer.

[Today, Sue is hosting Sanjay Sidhwani, SVP of Advanced Analytics for Synchrony Financial, as a guest blogger for The Consumer Connection.]

There’s a lot of buzz lately about machine learning. In many ways, it’s transforming the consumer experience and improving the products and operations of many companies. Plus, it’s not just for data analysts — machine learning has real benefits in the lives of the average consumer.

Ever wonder how Netflix serves up recommendations for the next movie or how your smartphone knows that you will be driving to work on Monday morning? Those are both examples of machine learning.

How is machine learning different from ordinary analytics? With traditional methods, an analyst defines the objective and looks for correlations between the objective and a defined set of data inputs. If new data comes in, the analyst needs to rerun the analysis and create new correlations and a new algorithm. This can take a while.

Machine learning is more efficient because it automatically takes new data inputs and adjusts, or “learns,” without manual intervention. So, the impact is immediate. How is it learning? The behavior drives the operation, not the programmers. Netflix recommendations are a good example. Once you watch a program or a movie, the next set of recommendations are created automatically without adjustments from an analyst.

Let’s take another example. Say you are considering buying a used car. What’s a fair price? Many factors determine this, such as age of car, miles driven, model and make. With enough data, we can infer the relationship between these factors and the price. This relationship can be linear, where the attributes have an additive effect (e.g., miles driven). But often the relationship is not linear. A car’s age, for instance, has a geometric effect on price (15 percent lower each year). In machine learning, the nature of these relationships doesn’t have to be a total guess. The programs automatically adjust these inputs and give us a fair price.

Machine learning can also help companies market offers more efficiently. One way is pattern recognition. There are patterns in customer buying behavior, for instance. Machine learning algorithms can predict the next likely item to be bought, helping a brand decide which customer should be targeted with what offer, better addressing their needs and wants and eliminating wasteful and costly marketing.

The challenge for companies is how to implement their learnings. What to do with the prediction — offer a discount? Display on the website? Send an email? The key to making the data impactful is “closing the loop” and refreshing the learnings so the data leads to actual behavior.

There is a budding community of data scientists and analysts who are exploring machine learning techniques. I recently attended a hackathon on Artificial Intelligence in our Innovation Station, a technology hub in our Chicago office. Most of the teams’ ideas used machine learning techniques combined with new types of data, such as facial recognition of an applicant’s LinkedIn picture to authenticate digital credit card applications or building a neural network chatbot that provides personalized service and account analytics.

The possibilities for marketers are exciting and endless. As we learn more about the technology, the real-world applications are likely to grow and provide even more value to brands and consumers alike.

Note: The views expressed in this blog are those of the blogger and not necessarily of Synchrony Financial.

Can a Machine Think for You?

I expect most of you are going to go with “No.” You might balk at the entire idea. But I had a conversation last week that pointed out that, if they’re working, isn’t that exactly what you’re counting on your marketing automation tools to do?

“As soon as we started thinking for you, it really became our civilization.” — Agent Smith, “The Matrix,” 1999, Warner Bros.
“As soon as we started thinking for you, it really became our civilization.”
— Agent Smith, “The Matrix,” 1999 Warner Bros.
People don’t make memes of this quote. For me, one of the most memorable lines of the movie.

Can a machine think for you?

I expect most of you are going to go with “No.” You might balk at the entire idea. But I had a conversation with Adobe’s Chris Wareham, senior director of product management for Adobe Analytics, last week at Adobe Summit where it became clear that, if they’re working, isn’t that exactly what you’re counting on your marketing tools to do?

“The state of the industry with data is to point a lot of really, really smart postgraduates with math at the problem and hope for good answers,” said Wareham. “And that’s not scaling.”

The bottleneck is that not everyone can be a data scientist, not everyone can do that kind of thinking, or has the training to do it themselves. Not everyone works effectively that way.

However, marketing departments today can’t afford to wait a week for the DBA on their IT teams to turn those reports out. That’s where Adobe’s virtual analyst comes in. According to Wareham, “the gap we’re filling in the industry is the need for people to be data-driven even in very simple interactions that they have.”

Wareham compares it to the revolution in we’ve seen in website analytics. Once (probably before many of you remember) finding out how much traffic was coming to your website involved getting daily or weekly reports from a guy called “The Webmaster.” Pretty quickly tools emerged to automate those reports, then deliver the numbers in real time. Google Analytics provides all that information, and a lot more we never dreamed of, in real-time.

“They were very complex things that made a very complex job really simple,” says Wareham. “So we’re starting to apply those same types of capabilities to a customer analytics problem set. Broadening the data set, leveraging the machine learning to automate a lot of those analytics processes, so a less sophisticated person can get a lot more leverage out of the data.”

And that’s where the robots come in. (Well, “virtual assistant,” but that’s really just one servo-enabled titanium chassis from the same thing, right?)

“Our usage of machine learning, our usage of things like the automated analyst, is really about applying machine learning to fix a problem,” says Wareham. To actually replace a data scientist takes more than reporting stats or tracking goals. The virtual assistant needs to be able to recognize the trends, opportunities and personas that a data scientist would, and that means breaking the rules. … Or at least the business rules many databases use to automate marketing

“Wherever we see rules, that smells like smoke to us,” says Wareham. “We want to get rid of the rules, and make everything that is currently rules-based algorithmically based, so it can learn, and it helps our customers get leverage out of the data.”

Robots breaking rules? Asimov would not approve, but it might be exactly the thing marketers need.

Whether this sounds joyous or terrifying probably depends on if you’re picturing Johnny Five or The Terminator.

Johnny Five, "Short Circuit," 1996 TriStar Pictures.
Johnny Five, “Short Circuit,” 1996 TriStar Pictures.
"The Terminator," 1984, Orion Pictures.
“The Terminator,” 1984, Orion Pictures.

Either way, it’s an interesting time to be a marketer.

Marketing and IT; Cats and Dogs

Cats and dogs do not get along unless they grew up together since birth. That is because cats and dogs have rather fundamental communication problems with each other. A dog would wag his tail in an upward position when he wants to play. To a cat though, upward-tail is a sure sign of hostility, as in “What’s up, dawg?!” In fact, if you observe an angry or nervous cat, you will see that everything is up; tail, hair, toes, even her spine. So imagine the dog’s confusion in this situation, where he just sent a friendly signal that he wants to play with the cat, and what he gets back are loud hisses and scary evil eyes—but along with an upward tail that “looks” like a peace sign to him. Yeah, I admit that I am a bona-fide dog person, so I looked at this from his perspective, first. But I sympathize with the cat, too. As from her point of view, the dog started to mess with her, disrupting an afternoon slumber in her favorite sunny spot by wagging his stupid tail. Encounters like this cannot end well. Thank goodness that us Homo sapiens lost our tails during our evolutionary journey, as that would have been one more thing that clueless guys would have to decode regarding the mood of our female companions. Imagine a conversation like “How could you not see that I didn’t mean it? My tail was pointing the ground when I said that!” Then a guy would say, “Oh jeez, because I was looking at your lips moving up and down when you were saying something?”

 

Cats and dogs do not get along unless they grew up together since birth. That is because cats and dogs have rather fundamental communication problems with each other. A dog would wag his tail in an upward position when he wants to play. To a cat though, upward-tail is a sure sign of hostility, as in “What’s up, dawg?!” In fact, if you observe an angry or nervous cat, you will see that everything is up; tail, hair, toes, even her spine. So imagine the dog’s confusion in this situation, where he just sent a friendly signal that he wants to play with the cat, and what he gets back are loud hisses and scary evil eyes—but along with an upward tail that “looks” like a peace sign to him. Yeah, I admit that I am a bona-fide dog person, so I looked at this from his perspective first. But I sympathize with the cat, too. As from her point of view, the dog started to mess with her, disrupting an afternoon slumber in her favorite sunny spot by wagging his stupid tail. Encounters like this cannot end well. Thank goodness that us Homo sapiens lost our tails during our evolutionary journey, as that would have been one more thing that clueless guys would have to decode regarding the mood of our female companions. Imagine a conversation like “How could you not see that I didn’t mean it? My tail was pointing the ground when I said that!” Then a guy would say, “Oh jeez, because I was looking at your lips moving up and down when you were saying something?”

Of course I am generalizing for a comedic effect here, but I see communication breakdowns like this all the time in business environments, especially between the marketing and IT teams. You think men are from Mars and women are from Venus? I think IT folks are from Vulcan and marketing people are from Betazed (if you didn’t get this, find a Trekkie around you and ask).

Now that we are living in the age of Big Data where marketing messages must be custom-tailored based on data, we really need to find a way to narrow the gap between the marketing and the IT world. I wouldn’t dare to say which side is more like a dog or a cat, as I will surely offend someone. But I think even non-Trekkies would agree that it could be terribly frustrating to talk to a Vulcan who thinks that every sentence must be logically impeccable, or a Betazed who thinks that someone’s emotional state is the way it is just because she read it that way. How do they meet in the middle? They need a translator—generally a “human” captain of a starship—between the two worlds, and that translator had better speak both languages fluently and understand both cultures without any preconceived notions.

Similarly, we need translators between the IT world and the marketing world, too. Call such translators “data scientists” if you want (refer to “How to Be a Good Data Scientist”). Or, at times a data strategist or a consultant like myself plays that role. Call us “bats” caught in between the beasts and the birds in an Aesop’s tale, as we need to be marginal people who don’t really belong to one specific world 100 percent. At times, it is a lonely place as we are understood by none, and often we are blamed for representing “the other side.” It is hard enough to be an expert in data and analytics, and we now have to master the artistry of diplomacy. But that is the reality, and I have seen plenty of evidence as to why people whose main job it is to harness meanings out of data must act as translators, as well.

IT is a very special function in modern organizations, regardless of their business models. Systems must run smoothly without errors, and all employees and outside collaborators must constantly be in connection through all imaginable devices and operating systems. Data must be securely stored and backed up regularly, and permissions to access them must be granted based on complex rules, based on job levels and functions. Then there are constant requests to install and maintain new and strange software and technologies, which should be patched and updated diligently. And God forbid if anything fails to work even for a few seconds on a weekend, all hell will break lose. Simply, the end-users—many of them in positions of dealing with customers and clients directly—do not care about IT when things run smoothly, as they take them all for granted. But when they don’t, you know the consequences. Thankless job? You bet. It is like a utility company never getting praises when the lights are up, but everyone yelling and screaming if the service is disrupted, even for a natural cause.

On the other side of the world, there are marketers, salespeople and account executives who deal with customers, clients and their bosses, who would treat IT like their servants, not partners, when things do not “seem” to work properly or when “their” sales projections are not met. The craziest part is that most customers, clients and bosses state their goals and complaints in the most ambiguous terms, as in “This ad doesn’t look slick enough,” “This copy doesn’t talk to me,” “This app doesn’t stick” or “We need to find the right audience.” What the IT folks often do not grasp is that (1) it really stinks when you get yelled at by customers and clients for any reason, and (2) not all business goals are easily translatable to logical statements. And this is when all data elements and systems are functioning within normal parameters.

Without a proper translator, marketers often self-prescribe solutions that call for data work and analytics. Often, they think that all the problems will go away if they have unlimited access to every piece of data ever collected. So they ask for exactly that. IT will respond that such request will put a terrible burden on the system, which has to support not just marketing but also other operations. Eventually they may meet in the middle and the marketer will have access to more data than ever possible in the past. Then the marketers realize that their business issues do not go away just because they have more data in their hands. In fact, their job seems to have gotten even more complicated. They think that it is because data elements are too difficult to understand and they start blaming the data dictionary or lack thereof. They start using words like Data Governance and Quality Control, which may sound almost offensive to diligent IT personnel. IT will respond that they showed every useful bit of data they are allowed to share without breaking the security protocol, and the data dictionaries are all up to date. Marketers say the data dictionaries are hard to understand, and they are filled with too many similar variables and seemingly conflicting information. IT now says they need yet another tool set to properly implement data governance protocols and deploy them. Heck, I have seen cases where some heads of IT went for complete re-platforming of their system, as if that would answer all the marketing questions. Now, does this sound familiar so far? Does it sound like your own experience, like when you are reading “Dilbert” comic strips? It is because you are not alone in all this.

Allow me to be a little more specific with an example. Marketers often talk about “High-Value Customers.” To people who deal with 1s and 0s, that means less than nothing. What does that even mean? Because “high-value customers” could be:

  • High-dollar spenders—But what if they do not purchase often?
  • Frequent shoppers—But what if they don’t spend much at all?
  • Recent customers—Oh, those coveted “hotline” names … but will they stay that way, even for another few months?
  • Tenured customers—But are they loyal to your business, now?
  • Customers with high loyalty points—Or are they just racking up points and they would do anything to accumulate points?
  • High activity—Such as point redemptions and other non-monetary activities, but what if all those activities do not generate profit?
  • Profitable customers—The nice ones who don’t need much hand-holding. And where do we get the “cost” side of the equation on a personal level?
  • Customers who purchases extra items—Such as cruisers who drink a lot on board or diners who order many special items, as suggested.
  • Etc., etc …

Now it gets more complex, as these definitions must be represented in numbers and figures, and depending on the industry, whether be they for retailers, airlines, hotels, cruise ships, credit cards, investments, utilities, non-profit or business services, variables that would be employed to define seemingly straightforward “high-value customers” would be vastly different. But let’s say that we pick an airline as an example. Let me ask you this; how frequent is frequent enough for anyone to be called a frequent flyer?

Let’s just assume that we are going through an exercise of defining a frequent flyer for an airline company, not for any other travel-related businesses or even travel agencies (that would deal with lots of non-flyers). Granted that we have access to all necessary data, we may consider using:

1. Number of Miles—But for how many years? If we go back too far, shouldn’t we have to examine further if the customer is still active with the airline in question? And what does “active” mean to you?

2. Dollars Spent—Again for how long? In what currency? Converted into U.S. dollars at what point in time?

3. Number of Full-Price Ticket Purchases—OK, do we get to see all the ticket codes that define full price? What about customers who purchased tickets through partners and agencies vs. direct buyers through the airline’s website? Do they share a common coding system?

4. Days Between Travel—What date shall we use? Booking date, payment date or travel date? What time zones should we use for consistency? If UTC/GMT is to be used, how will we know who is booking trips during business hours vs. evening hours in their own time zone?

After a considerable hours of debate, let’s say that we reached the conclusion that all involved parties could live with. Then we find out that the databases from the IT department are all on “event” levels (such as clicks, views, bookings, payments, boarding, redemption, etc.), and we would have to realign and summarize the data—in terms of miles, dollars and trips—on an individual customer level to create a definition of “frequent flyers.”

In other words, we would need to see the data from the customer-centric point of view, just to begin the discussion about frequent flyers, not to mention how to communicate with each customer in the future. Now, it that a job for IT or marketing? Who will put the bell on the cat’s neck? (Hint: Not the dog.) Well, it depends. But this definitely is not a traditional IT function, nor is it a standalone analytical project. It is something in between, requiring a translator.

Customer-Centric Database, Revisited
I have been emphasizing the importance of a customer-centric view throughout this series, and I also shared some details regarding databases designed for marketing functions (refer to: “Cheat Sheet: Is Your Database Marketing Ready?”). But allow me to reiterate this point.

In the age of abundant and ubiquitous data, omnichannel marketing communication—optimized based on customers’ past transaction history, product preferences, and demographic and behavioral personas—should be an effortless routine. The reality is far from it for many organizations, as it is very common that much of the vital information is locked in silos without being properly consolidated or governed by a standard set of business rules. It is not that creating such a marketing-oriented database (or data-mart) is solely the IT department’s responsibility, but having a dedicated information source for efficient personalization should be an organizational priority in modern days.

Most databases nowadays are optimized for data collection, storage and rapid retrieval, and such design in general does not provide a customer-centric view—which is essential for any type of personalized communication via all conceivable channels and devices of the present and future. Using brand-, division-, product-, channel- or device-centric datasets is often the biggest obstacle in the journey to an optimal customer experience, as those describe events and transactions, not individuals. Further, bits and pieces of information must be transformed into answers to questions through advanced analytics, including statistical models.

In short, all analytical efforts must be geared toward meeting business objectives, and databases must be optimized for analytics (refer to “Chicken or the Egg? Data or Analytics?”). Unfortunately, the situation is completely reversed in many organizations, where analytical maneuvering is limited due to inadequate source data, and decision-making processes are dictated by limitations of available analytics. Visible symptoms of such cases are, to list a few, elongated project cycle time, decreasing response rates, ineffective customer communication, saturation of data sources due to overexposure, and—as I was emphasizing in this article—communication breakdown among divisions and team members. I can even go as far as to say that the lack of a properly designed analytical environment is the No. 1 cause of miscommunications between IT and marketing.

Without a doubt, key pieces of data must reside in the centralized data depository—generally governed by IT—for effective marketing. But that is only the beginning and still is just a part of the data collection process. Collected data must be consolidated around the solid definition of a “customer,” and all product-, transaction-, event- and channel-level information should be transformed into descriptors of customers, via data standardization, categorization, transformation and summarization. Then the data may be further enhanced via third-party data acquisition and statistical modeling, using all available data. In other words, raw data must be refined through these steps to be useful in marketing and other customer interactions, online or offline (refer to “‘Big Data’ Is Like Mining Gold for a Watch—Gold Can’t Tell Time“). It does not matter how well the original transaction- or event-level data are stored in the main database without visible errors, or what kind of state-of-the-art communication tool sets a company is equipped with. Trying to use raw data for a near real-time personalization engine is like putting unrefined oil into a high-performance sports car.

This whole data refinement process may sound like a daunting task, but it is not nearly as painful as analytical efforts to derive meanings out of unstructured, unconsolidated and uncategorized data that are scattered all over the organization. A customer-centric marketing database (call it a data-mart if “database” sounds too much like it should solely belong to IT) created with standard business rules and uniform variables sets would, in turn, provide an “analytics-ready” environment, where statistical modeling and other advanced analytics efforts would gain tremendous momentum. In the end, the decision-making process would become much more efficient as analytics would provide answers to questions, not just bits and pieces of fragmented data, to the ultimate beneficiaries of data. And answers to questions do not require an enormous data dictionary, either; fast-acting marketing machines do not have time to look up dictionaries, anyway.

Data Roadmap—Phased Approach
For the effort to build a consolidated marketing data platform that is analytics-ready (hence, marketing-ready), I always recommend a phased approach, as (1) inevitable complexity of a data consolidation project will be contained and managed more efficiently in carefully defined phases, and (2) each phase will require different types of expertise, tool sets and technologies. Nevertheless, the overall project must be managed by an internal champion, along with a group of experts who possess long-term vision and tactical knowledge in both database and analytics technologies. That means this effort must reside above IT and marketing, and it should be seen as a strategic effort for an entire organization. If the company already hired a Chief Data Officer, I would say that this should be one of the top priorities for that position. If not, outsourcing would be a good option, as an impartial decision-maker, who would play a role of a referee, may have to come from the outside.

The following are the major steps:

  1. Formulate Questions: “All of the above” is not a good way to start a complex project. In order to come up with the most effective way to build a centralized data depository, we first need to understand what questions must be answered by it. Too many database projects call for cars that must fly, as well.
  2. Data Inventory: Every organization has more data than it expected, and not all goldmines are in plain sight. All the gatekeepers of existing databases should be interviewed, and any data that could be valuable for customer descriptions or behavioral predictions should be considered, starting with product, transaction, promotion and response data, stemming from all divisions and marketing channels.
  3. Data Hygiene and Standardization: All available data fields should be examined and cleaned up, where some data may be discarded or modified. Free form fields would deserve special attention, as categorization and tagging are one of the key steps to opening up new intelligence.
  4. Customer Definition: Any existing Customer ID systems (such as loyalty program ID, account number, etc.) will be examined. It may be further enhanced with available PII (personally identifiable information), as there could be inconsistencies among different systems, and customers often move their residency or use multiple email addresses, creating duplicate identities. A consistent and reliable Customer ID system becomes the backbone of a customer-centric database.
  5. Data Consolidation: Data from different silos and divisions will be merged together based on the master Customer ID. A customer-centric database begins to take shape here. The database update process should be thoroughly tested, as “incremental” updates are often found to be more difficult than the initial build. The job is simply not done until after a few successful iterations of updates.
  6. Data Transformation: Once a solid Customer ID system is in place, all transaction- and event-level data will be transformed to “descriptors” of individual customers, via summarization by categories and creation of analytical variables. For example, all product information will be aligned for each customer, and transaction data will be converted into personal-level monetary summaries and activities, in both static and time-series formats. Promotion and response history data will go through similar processes, yielding individual-level ROI metrics by channel and time period. This is the single-most critical step in all of this, requiring deep knowledge in business, data and analytics, as the stage is being set for all future analytics and reporting efforts. Due to variety and uniqueness of business goals in different industries, a one-size-fits-all approach will not work, either.
  7. Analytical Projects: Test projects will be selected and the entire process will be done on the new platform. Ad-hoc reporting and complex modeling projects will be conducted, and the results will be graded on timing, accuracy, consistency and user-friendliness. An iterative approach is required, as it is impossible to foresee all possible user requests and related complexities upfront. A database should be treated as a living, breathing organism, not something rigid and inflexible. Marketers will “break-in” the database as they use it more routinely.
  8. Applying the Knowledge: The outcomes of analytical projects will be applied to the entire customer base, and live campaigns will be run based on them. Often, major breakdowns happen at the large-scale deployment stages; especially when dealing with millions of customers and complex mathematical formulae at the same time. A model-ready database will definitely minimize the risk (hence, the term “in-database scoring”), but the process will still require some fine-tuning. To proliferate gained knowledge throughout the organization, some model scores—which pack deep intelligence in small sizes—may be transferred back to the main databases managed by IT. Imagine model scores driving operational decisions—live, on the ground.
  9. Result Analysis: Good marketing intelligence engines must be equipped with feedback mechanisms, effectively closing the “loop” where each iteration of marketing efforts improves its effectiveness with accumulated knowledge on a customer level. It is very unfortunate that many marketers just move through the tracks set up by their predecessors, mainly because existing database environments are not even equipped to link necessary data elements on a customer level. Too many back-end analyses are just event-, offer- or channel-driven, not customer-centric. Can you easily tell which customer is over-, under- or adequately promoted, based on a personal-level promotion-and-response ratio? With a customer-centric view established, you can.

These are just high-level summaries of key steps, and each step should be managed as independent projects within a large-scale initiative with common goals. Some steps may run concurrently to reduce the overall timeline, and tactical knowledge in all required technologies and tool sets is the key for the successful implementation of centralized marketing intelligence.

Who Will Do the Work?
Then, who will be in charge of all this and who will actually do the work? As I mentioned earlier, a job of this magnitude requires a champion, and a CDO may be a good fit. But each of these steps will require different skill sets, so some outsourcing may be inevitable (more on how to pick an outsourcing partner in future articles).

But the case that should not be is the IT team or the analytics team solely dictating the whole process. Creating a central depository of marketing intelligence is something that sits between IT and marketing, and the decisions must be made with business goals in mind, not just limitations and challenges that IT faces. If the CDO or the champion of this type of initiative starts representing IT issues before overall business goals, then the project is doomed from the beginning. Again, it is not about touching the core database of the company, but realigning existing data assets to create new intelligence. Raw data (no matter how clean they are at the collection stage) are like unrefined raw materials to the users. What the decision-makers need are simple answers to their questions, not hundreds of data pieces.

From the user’s point of view, data should be:

  • Easy to understand and use (intuitive to non-mathematicians)
  • Bite-size (i.e., small answers, not mounds of raw data)
  • Useful and effective (consistently accurate)
  • Broad (answers should be available most of time, not just “sometimes”)
  • Readily available (data should be easily accessible via users’ favorite devices/channels)

And getting to this point is the job of a translator who sits in between marketing and IT. Call them data scientists or data strategists, if you like. But they do not belong to just marketing or IT, even though they have to understand both sides really well. Do not be rigid, insisting that all pilots must belong to the Air Force; some pilots do belong to the Navy.

Lastly, let me add this at the risk of sounding like I am siding with technologists. Marketers, please don’t be bad patients. Don’t be that bad patient who shows up at a doctor’s office with a specific prescription, as in “Don’t ask me why, but just give me these pills, now.” I’ve even met an executive who wanted a neural-net model for his business without telling me why. I just said to myself, “Hmm, he must have been to one of those analytics conferences recently.” Then after listening to his “business” issues, I prescribed an entirely different solution package.

So, instead of blurting out requests for pieces of data variables or queries using cool-sounding, semi-technical terms, state the business issues and challenges that you are facing as clearly as possible. IT and analytics specialists will prescribe the right solution for you if they understand the ultimate goals better. Too often, requesters determine the solutions they want without any understanding of underlying technical issues. Don’t forget that the end-users of any technology are only exposed to symptoms, not the causes.

And if Mr. Spock doesn’t seem to understand your issues and keeps saying that your statements are illogical, then call in a translator, even if you have to hire him for just one day. I know this all too well, because after all, this one phrase summarizes my entire career: “A bridge person between the marketing world and the IT world.” Although it ain’t easy to live a life as a marginal person.

What Does a Data Marketer Look Like?

The currency of nearly all marketing today is data. Ten years ago, we might have said much the same of digital marketing, and all the email, display, social, search, and mobile that’s came forward from it.

The currency of nearly all marketing today is data.

Ten years ago, we might have said much the same of digital marketing, and all the email, display, social, search, and mobile that’s came forward from it.

Twenty years ago, we could have said the same of database marketing and customer relationship management.

And wind back—measurability and accountability, the hallmarks of direct marketing—always have relied on data. We may have called it lists back in the day—but data are what lists have become. The inherent value of data is to know the shared attributes among the data elements and to use that knowledge.

Without a doubt, the “marketing of data” has evolved and transformed as much as marketing itself. Every day in our world, it’s not enough to have contact details on people, or any number of the hundreds of demographic, psychographic, contextual, social and behavioral overlays that may be available, we also need analytics power.

Recent research from The Winterberry Group underscores this point: data is now an $11 billion business in America, and that includes analytics services revenue. I recall an unofficial guestimate of a $2 billion data market back in the early 1990s, when that meant a North American directory of 30,000 plus response and compiled lists available for rental and exchanges.

Next month, the Data Innovators Group will host its annual Data Innovator of the Year Award dinner in New York. This year’s honoree is Auren Hoffman, CEO of LiveRamp (now owned by Acxiom), who says his mission “to connect data to every marketing application.” And so it shall be… Soon.

But who is going to all make it work? Let’s welcome the data marketer and the data scientists and strategists they employ.

Still, too many brands keep customer data in siloes. And while responsibly using offline data with online data is fast coming down the pike, marketing organizations need people in place who can help clients navigate the brave new world of data management platforms, data quality strategies, programmatic media exchanges, big data and small data, and all the algorithms that drive this important “stuff” often in real time. A list sale exists largely no more. Instead data is a pathway to opportunity, a challenge overcome, by way of a data-to-insights-to-strategy recommendation, and a discipline for testing and data quality that leads brands (and their agencies and data marketer partners) to succeed.

It’s more difficult than ever to be a successful data marketer, but our field is producing the partners that businesses, brands and chief marketing officers need. Now if we could just go find a few.

Thank you to the Hudson Valley Direct Marketing Association for enabling my participation at its recent “Meet the Masters” event. Ryan Lake (Lake Group Media), Mark Rickard (Rickard Squared) and Rob Sanchez (Merit Direct) are three CEOs of data marketing organizations who have a few suggestions on where we can all go to look.

How to Be a Good Data Scientist

I guess no one wants to be a plain “Analyst” anymore; now “Data Scientist” is the title of the day. Then again, I never thought that there was anything wrong with titles like “Secretary,” “Stewardess” or “Janitor,” either. But somehow, someone decided “Administrative Assistant” should replace “Secretary” completely, and that someone was very successful in that endeavor. So much so that, people actually get offended when they are called “Secretaries.” The same goes for “Flight Attendants.” If you want an extra bag of peanuts or the whole can of soda with ice on the side, do not dare to call any service personnel by the outdated title. The verdict is still out for the title “Janitor,” as it could be replaced by “Custodial Engineer,” “Sanitary Engineer,” “Maintenance Technician,” or anything that gives an impression that the job requirement includes a degree in engineering. No matter. When the inflation-adjusted income of salaried workers is decreasing, I guess the number of words in the job title should go up instead. Something’s got to give, right?

I guess no one wants to be a plain “Analyst” anymore; now “Data Scientist” is the title of the day. Then again, I never thought that there was anything wrong with titles like “Secretary,” “Stewardess” or “Janitor,” either. But somehow, someone decided “Administrative Assistant” should replace “Secretary” completely, and that someone was very successful in that endeavor. So much so that, people actually get offended when they are called “Secretaries.” The same goes for “Flight Attendants.” If you want an extra bag of peanuts or the whole can of soda with ice on the side, do not dare to call any service personnel by the outdated title. The verdict is still out for the title “Janitor,” as it could be replaced by “Custodial Engineer,” “Sanitary Engineer,” “Maintenance Technician,” or anything that gives an impression that the job requirement includes a degree in engineering. No matter. When the inflation-adjusted income of salaried workers is decreasing, I guess the number of words in the job title should go up instead. Something’s got to give, right?

Please do not ask me to be politically correct here. As an openly Asian person in America, I am not even sure why I should be offended when someone addresses me as an “Oriental.” Someone explained it to me a long time ago. The word is reserved for “things,” not for people. OK, then. I will be offended when someone knowingly addresses me as an Oriental, now that the memo has been out for a while. So, do me this favor and do not call me an Oriental (at least in front of my face), and I promise that I will not call anyone an “Occidental” in return.

In any case, anyone who touches data for living now wants to be called a Data Scientist. Well, the title is longer than one word, and that is a good start. Did anyone get a raise along with that title inflation? I highly doubt it. But I’ve noticed the qualifications got much longer and more complicated.

I have seen some job requirements for data scientists that call for “all” of the following qualifications:

  • A master’s degree in statistics or mathematics; able to build statistical models proficiently using R or SAS
  • Strong analytical and storytelling skills
  • Hands-on knowledge in technologies such as Hadoop, Java, Python, C++, NoSQL, etc., being able to manipulate the data any which way, independently
  • Deep knowledge in ETL (extract, transform and load) to handle data from all sources
  • Proven experience in data modeling and database design
  • Data visualization skills using whatever tools that are considered to be cool this month
  • Deep business/industry/domain knowledge
  • Superb written and verbal communication skills, being able to explain complex technical concepts in plain English
  • Etc. etc…

I actually cut this list short, as it is already becoming ridiculous. I just want to see the face of a recruiter who got the order to find super-duper candidates based on this list—at the same salary level as a Senior Statistician (another fine title). Heck, while we’re at it, why don’t we add that the candidate must look like Brad Pitt and be able to tap-dance, too? The long and the short of it is maybe some executive wanted to hire just “1” data scientist with all these skillsets, hoping to God that this mad scientist will be able to make sense out of mounds of unstructured and unorganized data all on her own, and provide business answers without even knowing what the question was in the first place.

Over the years, I have worked with many statisticians, analysts and programmers (notice that they are all one-word titles), dealing with large, small, clean, dirty and, at times, really dirty data (hence the title of this series, “Big Data, Small Data, Clean Data, Messy Data”). And navigating through all those data has always been a team effort.

Yes, there are some exceptional musicians who can write music and lyrics, sing really well, play all instruments, program sequencers, record, mix, produce and sell music—all on their own. But if you insist that only such geniuses can produce music, there won’t be much to listen to in this world. Even Stevie Wonder, who can write and sing, and play keyboards, drums and harmonicas, had close to 100 names on the album credits in his heyday. Yes, the digital revolution changed the music scene as much as the data industry in terms of team sizes, but both aren’t and shouldn’t be one-man shows.

So, if being a “Data Scientist” means being a super businessman/analyst/statistician who can program, build models, write, present and sell, we should all just give up searching for one in the near future within your budget. Literally, we may be able to find a few qualified candidates in the job market on a national level. Too bad that every industry report says we need tens of thousands of them, right now.

Conversely, if it is just a bloated new title for good old data analysts with some knowledge in statistical applications and the ability to understand business needs—yeah, sure. Why not? I know plenty of those people, and we can groom more of them. And I don’t even mind giving them new long-winded titles that are suitable for the modern business world and peer groups.

I have been in the data business for a long time. And even before the datasets became really large, I have always maintained the following division of labor when dealing with complex data projects involving advanced analytics:

  • Business Analysts
  • Programmers/Developers
  • Statistical Analysts

The reason is very simple: It is extremely difficult to be a master-level expert in just one of these areas. Out of hundreds of statisticians who I’ve worked with, I can count only a handful of people who even “tried” to venture into the business side. Of those, even fewer successfully transformed themselves into businesspeople, and they are now business owners of consulting practices or in positions with “Chief” in their titles (Chief Data Officer or Chief Analytics Officer being the title du jour).

On the other side of the spectrum, less than a 10th of decent statisticians are also good at coding to manipulate complex data. But even they are mostly not good enough to be completely independent from professional programmers or developers. The reality is, most statisticians are not very good at setting up workable samples out of really messy data. Simply put, handling data and developing analytical frameworks or models call for different mindsets on a professional level.

The Business Analysts, I think, are the closest to the modern-day Data Scientists; albeit that the ones in the past were less so technicians, due to available toolsets back then. Nevertheless, granted that it is much easier to teach business aspects to statisticians or developers than to convert businesspeople or marketers into coders (no offense, but true), many of these “in-between” people—between the marketing world and technology world, for example—are rooted in the technology world (myself included) or at least have a deep understanding of it.

At times labeled as Research Analysts, they are the folks who would:

  • Understand the business requirements and issues at hand
  • Prescribe suitable solutions
  • Develop tangible analytical projects
  • Perform data audits
  • Procure data from various sources
  • Translate business requirements into technical specifications
  • Oversee the progress as project managers
  • Create reports and visual presentations
  • Interpret the results and create “stories”
  • And present the findings and recommended next steps to decision-makers

Sounds complex? You bet it is. And I didn’t even list all the job functions here. And to do this job effectively, these Business/Research Analysts (or Data Scientists) must understand the technical limitations of all related areas, including database, statistics, and general analytics, as well as industry verticals, uniqueness of business models and campaign/transaction channels. But they do not have to be full-blown statisticians or coders; they just have to know what they want and how to ask for it clearly. If they know how to code as well, great. All the more power to them. But that would be like a cherry on top, as the business mindset should be in front of everything.

So, now that the data are bigger and more complex than ever in human history, are we about to combine all aspects of data and analytics business and find people who are good at absolutely everything? Yes, various toolsets made some aspects of analysts’ lives easier and simpler, but not enough to get rid of the partitions between positions completely. Some third basemen may be able to pitch, too. But they wouldn’t go on the mound as starting pitchers—not on a professional level. And yes, analysts who advance up through the corporate and socioeconomic ladder are the ones who successfully crossed the boundaries. But we shouldn’t wait for the ones who are masters of everything. Like I said, even Stevie Wonder needs great sound engineers.

Then, what would be a good path to find Data Scientists in the existing pool of talent? I have been using the following four evaluation criteria to identify individuals with upward mobility in the technology world for a long time. Like I said, it is a lot simpler and easier to teach business aspects to people with technical backgrounds than the other way around.

So let’s start with the techies. These are the qualities we need to look for:

1. Skills: When it comes to the technical aspect of it, the skillset is the most important criterion. Generally a person has it, or doesn’t have it. If we are talking about a developer, how good is he? Can he develop a database without wasting time? A good coder is not just a little faster than mediocre ones; he can be 10 to 20 times faster. I am talking about the ones who don’t have to look through some manual or the Internet every five minutes, but the ones who just know all the shortcuts and options. The same goes for statistical analysts. How well is she versed in all the statistical techniques? Or is she a one-trick pony? How is her track record? Are her models performing in the market for a prolonged time? The thing about statistical work is that time is the ultimate test; we eventually get to find out how well the prediction holds up in the real world.

2. Attitude: This is a very important aspect, as many techies are locked up in their own little world. Many are socially awkward, like characters in Dilbert or “Big Bang Theory,” and most much prefer to deal with the machines (where things are clean-cut binary) than people (well, humans can be really annoying). Some do not work well with others and do not know how to compromise at all, as they do not know how to look at the world from a different perspective. And there are a lot of lazy ones. Yes, lazy programmers are the ones who are more motivated to automate processes (primarily to support their laissez faire lifestyle), but the ones who blow the deadlines all the time are just too much trouble for the team. In short, a genius with a really bad attitude won’t be able to move to the business or the management side, regardless of the IQ score.

3. Communication: Many technical folks are not good at written or verbal communications. I am not talking about just the ones who are foreign-born (like me), even though most technically oriented departments are full of them. The issue is many technical people (yes, even the ones who were born and raised in the U.S., speaking English) do not communicate with the rest of the world very well. Many can’t explain anything without using technical jargon, nor can they summarize messages to decision-makers. Businesspeople don’t need to hear the life story about how complex the project was or how messy the data sets were. Conversely, many techies do not understand marketers or businesspeople who speak plain English. Some fail to grasp the concept that human beings are not robots, and most mortals often fail to communicate every sentence as a logical expression. When a marketer says “Omit customers in New York and New Jersey from the next campaign,” the coder on the receiving end shouldn’t take that as a proper Boolean logic. Yes, obviously a state cannot be New York “and” New Jersey at the same time. But most humans don’t (or can’t) distinguish such differences. Seriously, I’ve seen some developers who refuse to work with people whose command of logical expressions aren’t at the level of Mr. Spock. That’s the primary reason we need business analysts or project managers who work as translators between these two worlds. And obviously, the translators should be able to speak both languages fluently.

4. Business Understanding: Granted, the candidates in question are qualified in terms of criteria one through three. Their eagerness to understand the ultimate business goals behind analytical projects would truly set them apart from the rest on the path to become a data scientist. As I mentioned previously, many technically oriented people do not really care much about the business side of the deal, or even have slight curiosity about it. What is the business model of the company for which they are working? How do they make money? What are the major business concerns? What are the long- and short-term business goals of their clients? Why do they lose sleep at night? Before complaining about incomplete data, why are the databases so messy? How are the data being collected? What does all this data mean for their bottom line? Can you bring up the “So what?” question after a great scientific finding? And ultimately, how will we make our clients look good in front of “their” bosses? When we deal with technical issues, we often find ourselves at a crossroad. Picking the right path (or a path with the least amount of downsides) is not just an IT decision, but more of a business decision. The person who has a more holistic view of the world, without a doubt, would make a better decision—even for a minor difference in a small feature, in terms of programming. Unfortunately, it is very difficult to find such IT people who have a balanced view.

And that is the punchline. We want data scientists who have the right balance of business and technical acumen—not just jacks of all trades who can do all the IT and analytical work all by themselves. Just like business strategy isn’t solely set by a data strategist, data projects aren’t done by one super techie. What we need are business analysts or data scientists who truly “get” the business goals and who will be able to translate them into functional technical specifications, with an understanding of all the limitations of each technology piece that is to be employed—which is quite different from being able to do it all.

If the career path for a data scientist ultimately leads to Chief Data Officer or Chief Analytics Officer, it is important for the candidates to understand that such “chief” titles are all about the business, not the IT. As soon as a CDO, CAO or CTO start representing technology before business, that organization is doomed. They should be executives who understand the technology and employ it to increase profit and efficiency for the whole company. Movie directors don’t necessarily write scripts, hold the cameras, develop special effects or act out scenes. But they understand all aspects of the movie-making process and put all the resources together to create films that they envision. As soon as a director falls too deep into just one aspect, such as special effects, the resultant movie quickly becomes an unwatchable bore. Data business is the same way.

So what is my advice for young and upcoming data scientists? Master the basics and be a specialist first. Pick a field that fits your aptitude, whether it be programming, software development, mathematics or statistics, and try to be really good at it. But remain curious about other related IT fields.

Then travel the world. Watch lots of movies. Read a variety of books. Not just technical books, but books about psychology, sociology, philosophy, science, economics and marketing, as well. This data business is inevitably related to activities that generate revenue for some organization. Try to understand the business ecosystem, not just technical systems. As marketing will always be a big part of the Big Data phenomenon, be an educated consumer first. Then look at advertisements and marketing campaigns from the promotor’s point of view, not just from an annoyed consumer’s view. Be an informed buyer through all available channels, online or offline. Then imagine how the world will be different in the future, and how a simple concept of a monetary transaction will transform along with other technical advances, which will certainly not stop at ApplePay. All of those changes will turn into business opportunities for people who understand data. If you see some real opportunities, try to imagine how you would create a startup company around them. You will quickly realize answering technical challenges is not even the half of building a viable business model.

If you are already one of those data scientists, live up to that title and be solution-oriented, not technology-oriented. Don’t be a slave to technologies, or be whom we sometimes address as a “data plumber” (who just moves data from one place to another). Be a master who wields data and technology to provide useful answers. And most importantly, don’t be evil (like Google says), and never do things just because you can. Always think about the social consequences, as actions based on data and technology affect real people, often negatively (more on this subject in future article). If you want to ride this Big Data wave for the foreseeable future, try not to annoy people who may not understand all the ins and outs of the data business. Don’t be the guy who spoils it for everyone else in the industry.

A while back, I started to see the unemployment rate as a rate of people who are being left behind during the progress (if we consider technical innovations as progress). Every evolutionary stage since the Industrial Revolution created gaps between supply and demand of new skillsets required for the new world. And this wave is not going to be an exception. It is unfortunate that, in this age of a high unemployment rate, we have such hard times finding good candidates for high tech positions. On one side, there are too many people who were educated under the old paradigm. And on the other side, there are too few people who can wield new technologies and apply them to satisfy business needs. If this new title “Data Scientist” means the latter, then yes. We need more of them, for sure. But we all need to be more realistic about how to groom them, as it would take a village to do so. And if we can’t even agree on what the job description for a data scientist should be, we will need lots of luck developing armies of them.