Data Analytics Projects Only Benefit Marketers When Properly Applied

A recent report shared that only about 20% of all analytics projects work turns out to be beneficial to businesses. Such waste. Nonetheless, is that solely the fault of data scientists? After all, even effective medicine renders useless if the patient refuses to take it.

I recently read a report that only about 20% of all analytics projects work turns out to be beneficial to businesses. Such waste. Nonetheless, is that solely the fault of data scientists? After all, even effective medicine renders useless if the patient refuses to take it.

Then again, why would users reject the results of analytics work? At the risk of gross simplification, allow me to break it down into two categories: Cases where project goals do not align with the business goals, and others where good intelligence gets wasted due to lack of capability, procedure, or will to implement follow-up actions. Basically, poor planning in the beginning, and poor execution at the backend.

Results of analytics projects often get ignored if the project goal doesn’t serve the general strategy or specific needs of the business. To put it in a different way, projects stemming from the analyst’s intellectual curiosity may or may not align with business interests. Some math geek may be fascinated by the elegance of mathematical precision or complexity of solutions, but such intrigue rarely translates directly into monetization of data assets.

In business, faster and simpler answers are far more actionable and valuable. If I ask business people if they want an answer with 80% confidence level in next 2 days, or an answer with 95% certainty in 4 weeks, the great majority would choose the quicker but less-than-perfect answer. Why? Because the keyword in all this is “actionable,” not “certainty.”

Analysts who would like to maintain a distance from immediate business needs should instead pursue pure science in the world of academia (a noble cause, without a doubt). In business settings, however, we play with data only to make tangible differences, as in dollars, cents, minutes or seconds. Once such differences in philosophy are accepted and understood by all involved parties, then the real question is: What kind of answers are most needed to improve business results?

Setting Analytics Projects Up for Success

Defining the problem statement is the hardest part for many analysts. Even the ones who are well-trained often struggle with the goal setting process. Why? Because in school, the professor in charge provides the problems to solve, and students submit solutions to them.

In business, analysts must understand the intentions of decision makers (i.e., their clients), deciphering not-so-logical general statements and anecdotes. Yeah, sure, we need to attract more high-value customers, but how would we express such value via mathematical statements? What would the end result look like, and how will it be deployed to make any difference in the end?

If unchecked, many analytics projects move forward purely based on the analysts’ assumptions, or worse, procedural convenience factors. For example, if the goal of the project is to rank a customer list in the order of responsiveness to certain product offers, then to build models like that, one may employ all kinds of transactional, behavioral, response, and demographic data.

All these data types come with different strengths and weaknesses, and even different missing data ratios. In cases like this, I’ve encountered many — too many — analysts who would just omit the whole population with missing demographic data in the development universe. Sometimes such omission adds up to be over 30% of the whole. What, are we never going to reach out to those souls just because they lack some peripheral data points for them?

Good luck convincing the stakeholders who want to use the entire list for various channel promotions. “Sorry, we can provide model scores for only 70% of your valuable list,” is not going to cut it.

More than a few times, I received questions about what analysts should do when they have to reach deep into lower model groups (of response models) to meet the demand of marketers, knowing that the bottom half won’t perform well. My response would be to forget about the model — no matter how elegant it may be — and develop heuristic rules to eliminate obvious non-targets in the prospect universe. If the model gets to be used, it is almost certain that the modeler in charge will be blamed for mediocre or bad performance, anyway.

Then I firmly warn them to ask about typical campaign size “before” one starts building some fancy models. What is the point of building a response model when the emailer would blast emails as much as he wants? To prove that the analyst is well-versed in building complex response models? What difference would it ever make in the “real” world? With that energy, it would be far more prudent to build a series of personas and product affinity models to personalize messages and offers.

Supporting Analytics Results With Marketing

Now, let’s pause for a moment and think about the second major reason why the results of analytics are not utilized. Assume that the analytics team developed a series of personas and product affinity models to customize offers on a personal level. Does the marketing team have the ability to display different offers to different targets? Via email, websites, and/or print media? In other words, do they have capabilities and resources to show “a picture of two wine glasses filled with attractive looking red wine” to people who scored high scores in the “Wine Enthusiast” model?

I’ve encountered too many situations where marketers look concerned — rather than getting excited — when talking about personas for personalization. Not because they care about what analysts must go through to produce a series of models, but because they lack creative assets and technical capabilities to make it all happen.

They often complain about lack of budget to develop multiple versions of creatives, lack of proper digital asset management tools, lack of campaign management tools that allows complex versioning, lack of ability to serve dynamic contents on websites, etc. There is no shortage of reasons why something “cannot” be done.

But, even in a situation like that, it is not the job of a data scientist to suggest increasing investments in various areas, especially when “other” departments have to cough up the money. No one gets to command unlimited resources, and every department has its own priorities. What analytics professionals must do is to figure out all kinds of limitations beyond the little world of analytics, and prioritize the work in terms of actionability.

Consider what can be done with minimal changes in the marketing ecosystem, and for preservation of analytics and marketing departments, what efforts will immediately bring tangible results? Basically, what will we be able to brag about in front of CEOs and CFOs?

When to Put Analytics Projects First

Prioritization of analytics projects should never be done solely based on data availability, ease of data crunching or modeling, or “geek” factors. It should be done in terms of potential value of the result, immediate actionability, and most importantly, alignment with overall business objectives.

The fact that only about 20% of analytics work yields business value means that 80% of the work was never even necessary. Sure, data geeks deserve to have some fun once in a while, but the fun factor doesn’t pay for the systems, toolsets, data maintenance, and salaries.

Without proper problem statements on the front-end and follow-up actions on the back-end, no amount of analytical activities would produce any value for businesses. That is why data and analytics professionals must act as translators between the business world and the technical world. Without that critical consulting layer, it becomes the-luck-of-the-draw when prioritizing projects.

To stay on target, always start with a proper analytics roadmap covering from ideation to applications stages. To be valued and appreciated, data scientists must act as business consultants, as well.


Data Mining: Where to Dig First?

The main job of a modern data scientist is to answer business questions for decision-makers. To do that, they have to be translators between the business world and the technology world. This in-between position often creates a great amount of confusion for aspiring data scientists.

Data mining
“Big_Data_Prob,” Creative Commons license. | Credit: Flickr by KamiPhuc

In the age of abundant data, obtaining insights out of mounds of data often becomes overwhelming even for seasoned analysts. In the data-mining business, more than half of the struggle is about determining “where to dig first.”

The main job of a modern data scientist is to answer business questions for decision-makers. To do that, they have to be translators between the business world and the technology world. This in-between position often creates a great amount of confusion for aspiring data scientists, as the gaps between business challenges and the elements that makes up the answers are very wide, even with all of the toolsets that are supposedly “easy to use.” That’s because insights do not come out of the toolsets automatically.

Business questions are often very high-level or even obscure. Such as:

  • Let’s try this new feature with the “best” customers
  • How do we improve customer “experience”?
  • We did lots of marketing campaigns; what worked?

When someone mentions “best” customers, statistically trained analysts jump into the mode of “Yeah! Let’s build some models!” If you are holding a hammer, everything may look like nails. But we are not supposed to build models just because we can. Why should we build a model and, if we do, whom are we going after? What does that word “best” mean to you?

Breaking that word down in mathematically representable terms is indeed the first step for the analyst (along with the decision-makers). That’s because “best” can mean lots of different things.

If the users of the information are in the retail business, in a classical sense, it could mean:

  • Frequently Visiting Customers: Expressed in terms of “Number of transactions past 12 months,” “Life-to-date number of transactions,” “Average days between transactions,” “Number of Web visits,” etc.
  • Big Spenders: Expressed in terms of “Average amount per transaction,” “Average amount per customer for past four years,” “Lifetime total amount,” etc.
  • Recent Customers: Expressed in terms of “Days or weeks since last transaction.”

I am sure most young analysts would want requesters to express these terms like I did using actual variable names, but translating these terms into expressions that machines can understand is indeed their job. Also, even when these terms are agreed upon, exactly how high is high enough to be called the “best”? Top 10 percent? Top 100,000 customers? In terms of what, exactly? Cut-out based on some arbitrary dollar amount, like $10,000 per year? Just dollars, or frequency on top of it, too?

The word “best” may mean multiple things to different people at the same time. Some marketers — who may be running some loyalty program — may only care for the frequency factor, with a hint of customer value as a secondary measure.

But if we dig further, she may express the value of a customer in terms of “Number of points per customer,” instead of just dollar spending. Digging even deeper, we may even have to consider ratios between accumulated points vs. points redeemed over a certain period to define what “best” means. Now we are talking about three-dimensional matrix — spending level, points earned, and points redeemed — just to figure out what the best segment is. And we didn’t even begin to talk about the ideal size of such target segment.

Understanding long- and short-term business goals, and having “blends” of these figures is the most important step in data mining. Again, knowing where to dig is the first step.

Let’s take another example. If we introduce the “continuity” element in all of this — like in telecommunication, subscription or the travel businesses — the word “best” takes yet another different turn. Now we have to think about the longevity of the relationship, in addition to transaction and loyalty elements. For example:

  • Tenure: Expressed in terms of “Years since member signup,” “Months since first transaction,” or “Number of active months since signup”
  • Engagements: “Number of contacts for customer service, trouble-shooting, complaints, or package changes/upgrades”
  • Other Activities: Such as cancelation, delinquent payment, move or reactivation

For the airline business, “best” may mean different things for each flight. Data elements to consider could be:

  • Mileage program status
  • Lifetime mileage/YTD mileage
  • Ticket class/code
  • Ticket price paid for the flight/Discount amount
  • Frequency of the flight (Number of flights in the past 12 months, average days between flights/bookings)
  • Peripheral purchases and paid upgrades

Why do I list all of these tedious details? Because analysts must be ready for any type of business challenges and situations that decision-makers may throw at them.

Another example would be that even in the same credit card company, depending on the division — such as acquisition team and CRM team — the word “best” may mean completely different things. Yes, they all care for “good” customers, but the acquisition team may put more weight on responsiveness, while the CRM team may care for profitability above all else.

Speaking of customer care, “customer experience” can be broken down into multiple variables, again to pose different options to decision-makers. What is the customer experience made of, and what do we need to understand about the whole customer journey? In the age where we collect every click, every word and every view, defining such parameters is very important to get to the answers out fast.

In the sea of data, basically we need to extract the following elements of “experience”:

  • The Subject Matter or Product in Question: Why is the customer contacting us? Start with issue classifications and related product and product category designations. If they are in free form, better get them tagged and categorized. Difficulty level of the issue resolution can be assigned, as well.
  • Number of Actions and Reactions: Expressed in terms of number of contacts/inbound calls per customer, number of outbound calls, chats or services visits per customer.
  • Resolution: In no obscure terms, what was the outcome? Resolved or not resolved? Satisfactory or unsatisfactory? If they are embedded in some call log, better employ text analytics, pronto.
  • How Long Did All of This Take? Expressed in terms of “Minutes between initial contact and resolution,” “Average minutes between actions,” “Average duration of engagements,” etc. Basically, the shorter the better for all of this.

Good customer experience, this way, can be measured more objectively. Reporting required for evaluation of different scenarios can be improved immensely when the building blocks (i.e., variables and metrics) are solid.

Now let’s move onto yet another common question of “what worked — or didn’t work — in various marketing efforts.” Consultants often encounter this type of question, and the biggest hurdle often isn’t the analytics process itself, but messy, disparate, and unstructured data. To understand what worked, well, we must define what that means. First off, what was the desired outcome?

  • Opens and Clicks: Traditional digital analytics metrics
  • Conversion: Now we need to dig into transaction data and attribute them to proper campaigns and channels
  • Renewal: If it is for B-to-B or continuity programs
  • Elevation of Brand Image: Tricky and subjective, so we would need to break down this obscure word, as well.

As for what marketers did to invoke responses from customers or prospects, let’s start breaking down that “what” of the “What worked?” question from that angle. Specifically:

  • Channel: A must-have in the omnichannel world.
  • Source: Where the contact name came from?
  • Selection Criteria: How did you choose the name to contact? By what variable? If advanced analytics were employed, with what segment, what model and what model groups?
  • Campaign Type/Name/Purpose: Such as annual product push, back-to-school sale, Christmas offer, spring clearance, etc.
  • Product: What was the main product featured in the campaign?
  • Offer: What was the hook? Dollar or percentage off? Free shipping? Buy-one-get-one-free? No-payment-until? Discount for a limited period?
  • Creative Elements: Such as content version, types of pictures, font type/size, tag lines, other graphic elements.
  • Drop Day/Time: Daypart of the campaign drop, day of the week, seasonal, etc.
  • Wave: If the campaign involved multiple waves.
  • A/B Testing Elements: A/B testing may have been done in a more controlled environment, but it may be prudent to carry any A/B testing elements on a customer level throughout.

These are, of course, just some of the suggestions. Different businesses may call for vastly different sets of parameters. I tell analysts not to insist on any particular element, but to try to obtain as much clean and dirty data as possible. Nonetheless, I am pointing out that breaking the elements down this way, upfront, is a necessary first step toward answering the “what worked” question.

I have been saying “Big data must get smaller” (refer to “Big Data Must Get Smaller”) for some time now. To do that, we must define the question first. Then we can narrow down the types of data elements that are necessary to (1) define the question in a way that a machine can understand, and (2) derive answers in more comprehensive and consistent ways.

True insights, often, are not a simple summary of findings out of fancy graphical charts. In fact, knowing where to dig next is indeed a valuable insight in itself, like in mining valuable minerals and gems. Understanding where to start the data mining process ultimately determines the quality of all subsequent analytics and insights.

So, when faced with an obscene amount of data and ambiguous questions, start breaking things down to smaller and more tangible elements. Even marketers without analytical training will understand data better that way.

Machine Learning: More Common Than You Think

There’s a lot of buzz lately about machine learning. In many ways, it’s transforming the consumer experience and improving the products and operations of many companies. Plus, it’s not just for data analysts — machine learning has real benefits in the lives of the average consumer.

[Today, Sue is hosting Sanjay Sidhwani, SVP of Advanced Analytics for Synchrony Financial, as a guest blogger for The Consumer Connection.]

There’s a lot of buzz lately about machine learning. In many ways, it’s transforming the consumer experience and improving the products and operations of many companies. Plus, it’s not just for data analysts — machine learning has real benefits in the lives of the average consumer.

Ever wonder how Netflix serves up recommendations for the next movie or how your smartphone knows that you will be driving to work on Monday morning? Those are both examples of machine learning.

How is machine learning different from ordinary analytics? With traditional methods, an analyst defines the objective and looks for correlations between the objective and a defined set of data inputs. If new data comes in, the analyst needs to rerun the analysis and create new correlations and a new algorithm. This can take a while.

Machine learning is more efficient because it automatically takes new data inputs and adjusts, or “learns,” without manual intervention. So, the impact is immediate. How is it learning? The behavior drives the operation, not the programmers. Netflix recommendations are a good example. Once you watch a program or a movie, the next set of recommendations are created automatically without adjustments from an analyst.

Let’s take another example. Say you are considering buying a used car. What’s a fair price? Many factors determine this, such as age of car, miles driven, model and make. With enough data, we can infer the relationship between these factors and the price. This relationship can be linear, where the attributes have an additive effect (e.g., miles driven). But often the relationship is not linear. A car’s age, for instance, has a geometric effect on price (15 percent lower each year). In machine learning, the nature of these relationships doesn’t have to be a total guess. The programs automatically adjust these inputs and give us a fair price.

Machine learning can also help companies market offers more efficiently. One way is pattern recognition. There are patterns in customer buying behavior, for instance. Machine learning algorithms can predict the next likely item to be bought, helping a brand decide which customer should be targeted with what offer, better addressing their needs and wants and eliminating wasteful and costly marketing.

The challenge for companies is how to implement their learnings. What to do with the prediction — offer a discount? Display on the website? Send an email? The key to making the data impactful is “closing the loop” and refreshing the learnings so the data leads to actual behavior.

There is a budding community of data scientists and analysts who are exploring machine learning techniques. I recently attended a hackathon on Artificial Intelligence in our Innovation Station, a technology hub in our Chicago office. Most of the teams’ ideas used machine learning techniques combined with new types of data, such as facial recognition of an applicant’s LinkedIn picture to authenticate digital credit card applications or building a neural network chatbot that provides personalized service and account analytics.

The possibilities for marketers are exciting and endless. As we learn more about the technology, the real-world applications are likely to grow and provide even more value to brands and consumers alike.

Note: The views expressed in this blog are those of the blogger and not necessarily of Synchrony Financial.

How to Be a Good Data Scientist

I guess no one wants to be a plain “Analyst” anymore; now “Data Scientist” is the title of the day. Then again, I never thought that there was anything wrong with titles like “Secretary,” “Stewardess” or “Janitor,” either. But somehow, someone decided “Administrative Assistant” should replace “Secretary” completely, and that someone was very successful in that endeavor. So much so that, people actually get offended when they are called “Secretaries.” The same goes for “Flight Attendants.” If you want an extra bag of peanuts or the whole can of soda with ice on the side, do not dare to call any service personnel by the outdated title. The verdict is still out for the title “Janitor,” as it could be replaced by “Custodial Engineer,” “Sanitary Engineer,” “Maintenance Technician,” or anything that gives an impression that the job requirement includes a degree in engineering. No matter. When the inflation-adjusted income of salaried workers is decreasing, I guess the number of words in the job title should go up instead. Something’s got to give, right?

I guess no one wants to be a plain “Analyst” anymore; now “Data Scientist” is the title of the day. Then again, I never thought that there was anything wrong with titles like “Secretary,” “Stewardess” or “Janitor,” either. But somehow, someone decided “Administrative Assistant” should replace “Secretary” completely, and that someone was very successful in that endeavor. So much so that, people actually get offended when they are called “Secretaries.” The same goes for “Flight Attendants.” If you want an extra bag of peanuts or the whole can of soda with ice on the side, do not dare to call any service personnel by the outdated title. The verdict is still out for the title “Janitor,” as it could be replaced by “Custodial Engineer,” “Sanitary Engineer,” “Maintenance Technician,” or anything that gives an impression that the job requirement includes a degree in engineering. No matter. When the inflation-adjusted income of salaried workers is decreasing, I guess the number of words in the job title should go up instead. Something’s got to give, right?

Please do not ask me to be politically correct here. As an openly Asian person in America, I am not even sure why I should be offended when someone addresses me as an “Oriental.” Someone explained it to me a long time ago. The word is reserved for “things,” not for people. OK, then. I will be offended when someone knowingly addresses me as an Oriental, now that the memo has been out for a while. So, do me this favor and do not call me an Oriental (at least in front of my face), and I promise that I will not call anyone an “Occidental” in return.

In any case, anyone who touches data for living now wants to be called a Data Scientist. Well, the title is longer than one word, and that is a good start. Did anyone get a raise along with that title inflation? I highly doubt it. But I’ve noticed the qualifications got much longer and more complicated.

I have seen some job requirements for data scientists that call for “all” of the following qualifications:

  • A master’s degree in statistics or mathematics; able to build statistical models proficiently using R or SAS
  • Strong analytical and storytelling skills
  • Hands-on knowledge in technologies such as Hadoop, Java, Python, C++, NoSQL, etc., being able to manipulate the data any which way, independently
  • Deep knowledge in ETL (extract, transform and load) to handle data from all sources
  • Proven experience in data modeling and database design
  • Data visualization skills using whatever tools that are considered to be cool this month
  • Deep business/industry/domain knowledge
  • Superb written and verbal communication skills, being able to explain complex technical concepts in plain English
  • Etc. etc…

I actually cut this list short, as it is already becoming ridiculous. I just want to see the face of a recruiter who got the order to find super-duper candidates based on this list—at the same salary level as a Senior Statistician (another fine title). Heck, while we’re at it, why don’t we add that the candidate must look like Brad Pitt and be able to tap-dance, too? The long and the short of it is maybe some executive wanted to hire just “1” data scientist with all these skillsets, hoping to God that this mad scientist will be able to make sense out of mounds of unstructured and unorganized data all on her own, and provide business answers without even knowing what the question was in the first place.

Over the years, I have worked with many statisticians, analysts and programmers (notice that they are all one-word titles), dealing with large, small, clean, dirty and, at times, really dirty data (hence the title of this series, “Big Data, Small Data, Clean Data, Messy Data”). And navigating through all those data has always been a team effort.

Yes, there are some exceptional musicians who can write music and lyrics, sing really well, play all instruments, program sequencers, record, mix, produce and sell music—all on their own. But if you insist that only such geniuses can produce music, there won’t be much to listen to in this world. Even Stevie Wonder, who can write and sing, and play keyboards, drums and harmonicas, had close to 100 names on the album credits in his heyday. Yes, the digital revolution changed the music scene as much as the data industry in terms of team sizes, but both aren’t and shouldn’t be one-man shows.

So, if being a “Data Scientist” means being a super businessman/analyst/statistician who can program, build models, write, present and sell, we should all just give up searching for one in the near future within your budget. Literally, we may be able to find a few qualified candidates in the job market on a national level. Too bad that every industry report says we need tens of thousands of them, right now.

Conversely, if it is just a bloated new title for good old data analysts with some knowledge in statistical applications and the ability to understand business needs—yeah, sure. Why not? I know plenty of those people, and we can groom more of them. And I don’t even mind giving them new long-winded titles that are suitable for the modern business world and peer groups.

I have been in the data business for a long time. And even before the datasets became really large, I have always maintained the following division of labor when dealing with complex data projects involving advanced analytics:

  • Business Analysts
  • Programmers/Developers
  • Statistical Analysts

The reason is very simple: It is extremely difficult to be a master-level expert in just one of these areas. Out of hundreds of statisticians who I’ve worked with, I can count only a handful of people who even “tried” to venture into the business side. Of those, even fewer successfully transformed themselves into businesspeople, and they are now business owners of consulting practices or in positions with “Chief” in their titles (Chief Data Officer or Chief Analytics Officer being the title du jour).

On the other side of the spectrum, less than a 10th of decent statisticians are also good at coding to manipulate complex data. But even they are mostly not good enough to be completely independent from professional programmers or developers. The reality is, most statisticians are not very good at setting up workable samples out of really messy data. Simply put, handling data and developing analytical frameworks or models call for different mindsets on a professional level.

The Business Analysts, I think, are the closest to the modern-day Data Scientists; albeit that the ones in the past were less so technicians, due to available toolsets back then. Nevertheless, granted that it is much easier to teach business aspects to statisticians or developers than to convert businesspeople or marketers into coders (no offense, but true), many of these “in-between” people—between the marketing world and technology world, for example—are rooted in the technology world (myself included) or at least have a deep understanding of it.

At times labeled as Research Analysts, they are the folks who would:

  • Understand the business requirements and issues at hand
  • Prescribe suitable solutions
  • Develop tangible analytical projects
  • Perform data audits
  • Procure data from various sources
  • Translate business requirements into technical specifications
  • Oversee the progress as project managers
  • Create reports and visual presentations
  • Interpret the results and create “stories”
  • And present the findings and recommended next steps to decision-makers

Sounds complex? You bet it is. And I didn’t even list all the job functions here. And to do this job effectively, these Business/Research Analysts (or Data Scientists) must understand the technical limitations of all related areas, including database, statistics, and general analytics, as well as industry verticals, uniqueness of business models and campaign/transaction channels. But they do not have to be full-blown statisticians or coders; they just have to know what they want and how to ask for it clearly. If they know how to code as well, great. All the more power to them. But that would be like a cherry on top, as the business mindset should be in front of everything.

So, now that the data are bigger and more complex than ever in human history, are we about to combine all aspects of data and analytics business and find people who are good at absolutely everything? Yes, various toolsets made some aspects of analysts’ lives easier and simpler, but not enough to get rid of the partitions between positions completely. Some third basemen may be able to pitch, too. But they wouldn’t go on the mound as starting pitchers—not on a professional level. And yes, analysts who advance up through the corporate and socioeconomic ladder are the ones who successfully crossed the boundaries. But we shouldn’t wait for the ones who are masters of everything. Like I said, even Stevie Wonder needs great sound engineers.

Then, what would be a good path to find Data Scientists in the existing pool of talent? I have been using the following four evaluation criteria to identify individuals with upward mobility in the technology world for a long time. Like I said, it is a lot simpler and easier to teach business aspects to people with technical backgrounds than the other way around.

So let’s start with the techies. These are the qualities we need to look for:

1. Skills: When it comes to the technical aspect of it, the skillset is the most important criterion. Generally a person has it, or doesn’t have it. If we are talking about a developer, how good is he? Can he develop a database without wasting time? A good coder is not just a little faster than mediocre ones; he can be 10 to 20 times faster. I am talking about the ones who don’t have to look through some manual or the Internet every five minutes, but the ones who just know all the shortcuts and options. The same goes for statistical analysts. How well is she versed in all the statistical techniques? Or is she a one-trick pony? How is her track record? Are her models performing in the market for a prolonged time? The thing about statistical work is that time is the ultimate test; we eventually get to find out how well the prediction holds up in the real world.

2. Attitude: This is a very important aspect, as many techies are locked up in their own little world. Many are socially awkward, like characters in Dilbert or “Big Bang Theory,” and most much prefer to deal with the machines (where things are clean-cut binary) than people (well, humans can be really annoying). Some do not work well with others and do not know how to compromise at all, as they do not know how to look at the world from a different perspective. And there are a lot of lazy ones. Yes, lazy programmers are the ones who are more motivated to automate processes (primarily to support their laissez faire lifestyle), but the ones who blow the deadlines all the time are just too much trouble for the team. In short, a genius with a really bad attitude won’t be able to move to the business or the management side, regardless of the IQ score.

3. Communication: Many technical folks are not good at written or verbal communications. I am not talking about just the ones who are foreign-born (like me), even though most technically oriented departments are full of them. The issue is many technical people (yes, even the ones who were born and raised in the U.S., speaking English) do not communicate with the rest of the world very well. Many can’t explain anything without using technical jargon, nor can they summarize messages to decision-makers. Businesspeople don’t need to hear the life story about how complex the project was or how messy the data sets were. Conversely, many techies do not understand marketers or businesspeople who speak plain English. Some fail to grasp the concept that human beings are not robots, and most mortals often fail to communicate every sentence as a logical expression. When a marketer says “Omit customers in New York and New Jersey from the next campaign,” the coder on the receiving end shouldn’t take that as a proper Boolean logic. Yes, obviously a state cannot be New York “and” New Jersey at the same time. But most humans don’t (or can’t) distinguish such differences. Seriously, I’ve seen some developers who refuse to work with people whose command of logical expressions aren’t at the level of Mr. Spock. That’s the primary reason we need business analysts or project managers who work as translators between these two worlds. And obviously, the translators should be able to speak both languages fluently.

4. Business Understanding: Granted, the candidates in question are qualified in terms of criteria one through three. Their eagerness to understand the ultimate business goals behind analytical projects would truly set them apart from the rest on the path to become a data scientist. As I mentioned previously, many technically oriented people do not really care much about the business side of the deal, or even have slight curiosity about it. What is the business model of the company for which they are working? How do they make money? What are the major business concerns? What are the long- and short-term business goals of their clients? Why do they lose sleep at night? Before complaining about incomplete data, why are the databases so messy? How are the data being collected? What does all this data mean for their bottom line? Can you bring up the “So what?” question after a great scientific finding? And ultimately, how will we make our clients look good in front of “their” bosses? When we deal with technical issues, we often find ourselves at a crossroad. Picking the right path (or a path with the least amount of downsides) is not just an IT decision, but more of a business decision. The person who has a more holistic view of the world, without a doubt, would make a better decision—even for a minor difference in a small feature, in terms of programming. Unfortunately, it is very difficult to find such IT people who have a balanced view.

And that is the punchline. We want data scientists who have the right balance of business and technical acumen—not just jacks of all trades who can do all the IT and analytical work all by themselves. Just like business strategy isn’t solely set by a data strategist, data projects aren’t done by one super techie. What we need are business analysts or data scientists who truly “get” the business goals and who will be able to translate them into functional technical specifications, with an understanding of all the limitations of each technology piece that is to be employed—which is quite different from being able to do it all.

If the career path for a data scientist ultimately leads to Chief Data Officer or Chief Analytics Officer, it is important for the candidates to understand that such “chief” titles are all about the business, not the IT. As soon as a CDO, CAO or CTO start representing technology before business, that organization is doomed. They should be executives who understand the technology and employ it to increase profit and efficiency for the whole company. Movie directors don’t necessarily write scripts, hold the cameras, develop special effects or act out scenes. But they understand all aspects of the movie-making process and put all the resources together to create films that they envision. As soon as a director falls too deep into just one aspect, such as special effects, the resultant movie quickly becomes an unwatchable bore. Data business is the same way.

So what is my advice for young and upcoming data scientists? Master the basics and be a specialist first. Pick a field that fits your aptitude, whether it be programming, software development, mathematics or statistics, and try to be really good at it. But remain curious about other related IT fields.

Then travel the world. Watch lots of movies. Read a variety of books. Not just technical books, but books about psychology, sociology, philosophy, science, economics and marketing, as well. This data business is inevitably related to activities that generate revenue for some organization. Try to understand the business ecosystem, not just technical systems. As marketing will always be a big part of the Big Data phenomenon, be an educated consumer first. Then look at advertisements and marketing campaigns from the promotor’s point of view, not just from an annoyed consumer’s view. Be an informed buyer through all available channels, online or offline. Then imagine how the world will be different in the future, and how a simple concept of a monetary transaction will transform along with other technical advances, which will certainly not stop at ApplePay. All of those changes will turn into business opportunities for people who understand data. If you see some real opportunities, try to imagine how you would create a startup company around them. You will quickly realize answering technical challenges is not even the half of building a viable business model.

If you are already one of those data scientists, live up to that title and be solution-oriented, not technology-oriented. Don’t be a slave to technologies, or be whom we sometimes address as a “data plumber” (who just moves data from one place to another). Be a master who wields data and technology to provide useful answers. And most importantly, don’t be evil (like Google says), and never do things just because you can. Always think about the social consequences, as actions based on data and technology affect real people, often negatively (more on this subject in future article). If you want to ride this Big Data wave for the foreseeable future, try not to annoy people who may not understand all the ins and outs of the data business. Don’t be the guy who spoils it for everyone else in the industry.

A while back, I started to see the unemployment rate as a rate of people who are being left behind during the progress (if we consider technical innovations as progress). Every evolutionary stage since the Industrial Revolution created gaps between supply and demand of new skillsets required for the new world. And this wave is not going to be an exception. It is unfortunate that, in this age of a high unemployment rate, we have such hard times finding good candidates for high tech positions. On one side, there are too many people who were educated under the old paradigm. And on the other side, there are too few people who can wield new technologies and apply them to satisfy business needs. If this new title “Data Scientist” means the latter, then yes. We need more of them, for sure. But we all need to be more realistic about how to groom them, as it would take a village to do so. And if we can’t even agree on what the job description for a data scientist should be, we will need lots of luck developing armies of them.