Financial Institutions Can Put Artificial Intelligence to Much Better Use

I’ll start with a potentially controversial statement. Banks are misallocating their investment in artificial intelligence and predictive analytics by putting it into consumer-facing chatbots, rather than using it internally to empower their staff to understand and better serve the customer.

I’ll start with a potentially controversial statement. Banks are misallocating their investment in artificial intelligence and predictive analytics by putting it into consumer-facing chatbots, rather than using it internally to empower their staff to understand and better serve the customer.

Most customers don’t like speaking with bots and usually call their bank when they have an issue that requires processing that’s beyond what artificial intelligence can currently offer. In fact, AI’s reputation has been damaged virtually beyond recovery by the endless loop most customers encounter when they call the bank, not able to get to where they want to go.

Moreover, you don’t see pictures of chatbots pinned up in banks with “Employee of the Month” emblazoned across the bottom. Nor was any new business won on the strength of a chatbot’s performance. Finally, customers don’t stay with banks because they developed a great working relationship with a chatbot. Truth of the matter, chat hasn’t reached the level where it’s consistently reliable for addressing the customer concerns that rise to the level of making a call to a financial institution.

All that said, artificial intelligence is a highly powerful tool. How it’s being used is simply being misallocated. So the question becomes, is there a way banks can use it to enhance human engagement with clients? The answer is, “Yes.” Although banks and other financial institutions are in a completely different line of business than, say, a luxury retailer or car dealership, what they have in common is that critical need to engage customers at various points in a given transaction. This applies to banks and other financial institutions at least as much as it applies to other businesses. Reaching out to, connecting with and maintaining relationships with customers, and doing it well, is a key consideration. Done well, banks have a better chance of securing a higher lifetime value from their clients when they get it right. And it’s much harder for bankers or advisers to know about the hundreds of products that are available to them; far more so than, say, a car salesman at a dealership, or an associate in the dress department at Saks. AI’s best use is providing them — the customer-facing bank advisers — with the tools to have the right information for the right client, so they can spend more time on the customer relationship.

There are ways in which the power of predictive analytics can be brought to bear immediately, creating a more substantial and recognizable benefit for both financial services providers and their customers. A knowledge-driven approach to cross-selling and upselling is one such strategy.

There’s a vast range of training, tools and processes that can positively influence engagement efforts. But predictive analytics can push these initiatives into a much higher gear, providing a uniquely powerful impact when it comes to solidifying those all-important bonds with customers. Through better analysis and use of data that’s already available to most financial institutions in petabytes, it’s possible to learn more about customers, and consequently offer them more relevant service, support and product options. The right, internal approach to applying predictive analytics, therefore, results in benefits for both customers and the financial services providers they work with — a true win-win situation.

Historically, banks — especially large ones — tend to lean more toward conservative, careful approaches to new strategies and technology than quick movement and adoption. Given the mound of compliance mandates that govern their every engagement, this is understandable. But it but can be a significant drawback. This is where predictive analytics can sharpen their game. Many institutions have demonstrated a resistance to adopting this specific tool, or have used it in a very limited way. But they’re missing out on the benefits. And understanding the inherent pitfalls in predictive analytics is key to achieving success in deploying it.

How Financial Institutions Can Effectively Deploy Predictive Analytics

It’s a given that cross-selling and upselling help create more lifetime value from customers. But finding strong connections between products and clients is still a complicated process; particularly when you have to juggle moving parts, such as customer credit scores, income, credit utilization, and the like. Figuring out what products you can sell to whom, and predicting what those outcomes will be, constitutes a successful cross-sell. When done correctly and ethically, cross-selling can ultimately strengthen the customer relationship into a lifetime value — read, profitability — for the bank. This is because they’re able to match a product that was needed with a demand that they’ve identified.

It’s 20/20 hindsight, but we all know about the debacle of Wells Fargo’s unethical cross-selling and upselling, and how much trouble it got into as a result. With upselling, predictive analytics can really make a difference in the campaign to upsell. And unlike the Wells Fargo situation, this approach is sustainable. Looking through vast amounts of consumer data can help banks to understand how relationships have historically evolved between the bank and its consumer over time. On the consumer side, the spotlight is on how their data is being used. Only by robust analysis of customer behavior — ideally where multiple products are being offered — can banks regain their customers’ trust that their data is being used to benefit them.

Predictive analytics platforms can conduct this type of analysis, leaning on demographic information, as well as purchasing and financial data that institutions already have from past customer activity. All in real-time. Such an analysis would be prohibitive in terms of time, were trained experts to do the crunching. The predictive analytics tool can then offer sharply defined, personalized, relevant recommendations for staff members to share, while they continue to provide the critical human element in the cross-selling and upselling processes.

Where does this data come from? The sheer volume of payments data that banks gather, whether credit card, utilities, rent or many more — can inform what financial product the customer might be looking for and can afford, creating a sharper, more relevant offering. And that’s where artificial intelligence and predictive analytics can play a role that helps bankers sharpen their game and engage more successfully with their customers, without throwing them on the mercy of the bots. Incidentally, it also proves the notion that artificial intelligence is less about displacing humans and more about helping them perform higher-value work.

Securing profitable customers — back to the lifetime value concept — is job No. 1 for banks, whether small or large. Successfully cross-selling — truly matching a product with an identified need — goes a long way to strengthen that customer relationship. The current financial services landscape is ripe for improvement through the use of predictive analytics. Many institutions are already using advanced analytics, tied to marketing and basic interactions — but few have developed strong processes that focus on understanding customer habits and preferences. From there, they can use predictive tools to become more relevant, valuable — and humanly available — to their clients. The institutions that manage to do so will have an advantage in building stronger, longer-lasting relationships and will enjoy the increased value that comes from them.

With thanks to Carol Sabransky, SVP of Business Development, AArete, who made substantial and insightful contributions to this article.

4 Benefits of Applying Marketing Analytics

Marketing analytics is no small subject in today’s world of business. In fact, according to Transparency Market Research, the marketing analytics industry is set to grow by roughly 14% by 2022. Why such growth? Marketing analytics has a tremendous impact on a marketing organization’s activities, but also on a brand’s overall understanding of their entire company’s success.

Marketing analytics is no small subject in today’s world of business. In fact, according to Transparency Market Research, the marketing analytics industry is set to grow by roughly 14% by 2022. Why such growth? Marketing analytics has a tremendous impact on a marketing organization’s activities, but also on a brand’s overall understanding of their entire company’s success.

There are four unique benefits marketing analytics provides, and combined together, these benefits give a holistic view of an organization’s past, present and future.

But First: What Is Marketing Analytics and Why Is It Important?

Marketing analytics is a result of the technology and influx of data we use as marketers. Early on, marketing analytics was a relatively simple concept. It encompassed the process of evaluating marketing efforts from multiple data sources, processes or technology to understand the effectiveness of marketing activities from a big-picture view — often through the use of metrics. Fundamentally, it’s all about quantifying the results of marketing efforts that take place both online and offline.

Today, marketing analytics has become an entire industry that’s changing the way we work and the type of work we do as marketers. 

It’s important to measure the financial impact of not just marketing but of a variety of efforts from product and sales — which marketing analytics also can provide. As a result, knowing and understanding the different types of analysis and the benefits they provide within marketing analytics, can help to identify what metrics to focus on for what objectives — because objectives can be an endless list of how to understand or increase ROI, monitor trends over time, determine campaign effectiveness, forecast future results, and so on.

The 4 Benefits of Applying Marketing Analytics

1. Learn What Happened

Marketing analytics can first lend insight into what happened in the past and why. This is instrumental to marketing teams in order to avoid making the same mistakes. Through descriptive analysis and the use of customer relationship management and marketing automation platforms, analytics bring to light not only what happened in the past but also provide answers to questions on specific topics. For example, you can ask more about why a specific metric performed the way it did, or what impacted the sales of a specific product.

2. Gauge What’s Happening Now

Marketing analytics can also help you understand what’s currently taking place in regards to your marketing efforts. This helps determine if you need to pivot or quickly make changes in order to avoid mistakes or make improvements. Using dashboards to display current engagements in an email track or the status of new leads are examples of marketing analytics that look to assess the real-time status of marketing efforts. Usually, these dashboards are created by employing business intelligence practices in addition to a marketing automation platform.

3. Predict What Might Happen

Some could say the predictive aspect of marketing analytics is the most important part of it. Through predictive modelings such as regression analysis, clustering, propensity models and collaborative filtering, we can start to anticipate consumer behavior. Web analytics tracking that incorporates probabilities, for example, can be used to foresee when a person may leave a site and when. Marketers can then utilize this information to execute specific marketing tactics at those moments to retain customers.

Or perhaps it’s marketing analytics that assesses lead management processes to prioritize leads based on those similar to current customers. This helps identifies who already has a higher propensity to buy. Either way, the goal of marketing analytics for the future will be to move away from a rear-view strategy to focus on the future. Luckily, the influx of data, machine learning, and improved statistical algorithms mean our ability to accurately predict the likelihood of future outcomes will rise exponentially.

4. Optimize Efforts

This last benefit only comes when you combine your analytics with your market research objectives — but if you do so you could see the greatest impact. In fact, if you’re not ensuring your marketing analytics and market research work together, then you could be missing out on a lot of opportunities. Essentially, it’s about translating marketing analytics findings into market research objectives. A common mistake marketers make in conducting marketing analytics is forgetting to gather real customer feedback. This activity is important to bridge the gap between analytics insights, a marketing strategy and activation.

In addition to the first three benefits or approaches, brands should use marketing research as a tool to push their marketing analytics from just learning about lead generation and sales metrics to actually understand customers in the context of their marketing opportunities.

7 B2B Marketing Predictions for 2018

New-year predictions are a dangerous business. I will take the risk, and just hope that at the end of 2018 no one looks back to call me on it! B2B sales and marketing are evolving quickly — as buying behavior changes, and new technologies take hold. So, there’s a lot to talk about.

New-year predictions are a dangerous business. I will take the risk, and just hope that at the end of 2018 no one looks back to call me on it! B2B sales and marketing are evolving quickly — as buying behavior changes, and new technologies take hold. So, there’s a lot to talk about. But I shall limit my predictions to just seven, and hope they provide food for thought to my fellow followers of the B2B marketing scene.

1. More Growth in Marketing Technology — and More Consolidation

Ever since Scott Brinker began tracking the marketing tech space in 2011, when he identified 150 point solutions on the market, the category’s growth has been unstoppable. In 2017, he counted 5,381 solutions, up 40% from the year prior. This is nuts. And ripe for consolidation, as buyers sit paralyzed by the deluge, and vendors scramble to stand out. I predict major M&A next year. One corollary point is that marketing executives will need to be tech savvier than ever to manage their ever-growing stacks.

2. Predictive Analytics Becomes an Essential Tool in B2B

Data and modeling are nothing new in B2B, but the tools and strategies that have entered the toolset in the last few years are setting us up for a new kind of data-driven future. Particularly in prospecting, new resources like purchase signals (“intent data”) and lookalike modeling will continue to expand marketers’ access to new audiences and provide scale to their ABM programs. Look to Lattice, Mintigo, 6Sense, Leadspace and MRP Prelytix.

3. AI Gets Real

The marketing buzzword of the year, artificial intelligence will in 2018 prove its value in speeding up data processing and applying machine learning to digital advertising, predictive analytics, responsive websites, chatbots, and all manner of customer management. Look, when Salesforce.com introduced an AI plug-in called Einstein, my point was proved.

4. Self-service Analytics

As marketing tech gets more complex, and CMOs are close to controlling tech budgets as large as CIOs, next up is the need for simplicity, and new ways for marketers to take advantage of technology without becoming total geeks. Enter self-service, which essentially means more sophisticated business intelligence tools that feature ease of use along with speed and power. IBM’s Watson may be the most famous of the bunch. But cheaper, more accessible competitors will be coming along, I reckon.

5. GDPR Will Give B2B Marketers a Break

This is certainly wishful thinking, but my gut says the EU regulators will clarify whether some exceptions — or workarounds — may be available to B2B marketers as the May 25, 2018, deadline approaches. net in the UK has prepared a useful guide for B2B marketers on how to begin their compliance efforts. Meantime, Forrester predicts that 40 percent of marketers are going to take their chances and not even try to comply.

6. Customer Experience Will Become a Key Discipline in B2B

It’s been a long time coming, but B2B marketers are finally waking up to the fact that purchase decisions are based far less on price and more on direct and indirect experience with the product, the brand and the company.   Even in B2B, where things are supposed to be so rational. Sirius Decisions has been following this topic for some years. As interest grows, so will marketing departments focus on how to deliver consistent, informative and enjoyable experiences — online and off — to customers and prospects.

7. Understanding Millennial Buying Behavior Will Be Key to Success

I’ve offered tips about marketing to Millennials before. But new data suggests that this cohort is more influential than ever. They are now responsible for researching and influencing 65 percent of purchase decisions, and in 13 percent they are the decision makers themselves. Moreover, it turns out that the first place they look for solutions is not Google search and your website, but on social media. As these people age, their influence will grow. We need to be on their wavelength.

So, those are my predictions for B2B marketing in 2018. Anyone have others to offer?

A version of this article appeared in Biznology, the digital marketing blog.

Election Polls and the Price of Being Wrong 

The thing about predictive analytics is that the quality of a prediction is eventually exposed — clearly cut as right or wrong. There are casually incorrect outcomes, like a weather report failing to accurately declare at what time the rain will start, and then there are total shockers, like the outcome of the 2016 presidential election.

screen-shot-2016-11-17-at-1-03-34-pmThe thing about predictive analytics is that the quality of a prediction is eventually exposed — clearly cut as right or wrong. There are casually incorrect outcomes, like a weather report failing to accurately declare the time it will start raining, and then there are total shockers, like the outcome of the 2016 presidential election.

In my opinion, the biggest losers in this election cycle are pollsters, analysts, statisticians and, most of all, so-called pundits.

I am saying this from a concerned analyst’s point of view. We are talking about colossal and utter failure of prediction on every level here. Except for one or two publications, practically every source missed the mark by more than a mile — not just a couple points off here and there. Even the ones who achieved “guru” status by predicting the 2012 election outcome perfectly called for the wrong winner this time, boldly posting a confidence level of more than 70 percent just a few days before the election.

What Went Wrong? 

The losing party, pollsters and analysts must be in the middle of some deep soul-searching now. In all fairness, let’s keep in mind that no prediction can overcome serious sampling errors and data collection problems. Especially when we deal with sparsely populated areas, where the winner was decisively determined in the end, we must be really careful with the raw numbers of respondents, as errors easily get magnified by incomplete data.

Some of us saw that type of over- or under-projection when the Census Bureau cut the sampling size for budgetary reasons during the last survey cycle. For example, in a sparsely populated area, a few migrants from Asia may affect simple projections like “percent Asians” rather drastically. In large cities, conversely, the size of such errors are generally within more manageable ranges, thanks to large sample sizes.

Then there are human inconsistency elements that many pundits are talking about. Basically everyone got so sick of all of these survey calls about the election, many started to ignore them completely. I think pollsters must learn that at times, less is more. I don’t even live in a swing state, and I started to hang up on unknown callers long before Election Day. Can you imagine what the folks in swing states must have gone through?

Many are also claiming that respondents were not honest about how they were going to vote. But if that were the case, there are other techniques that surveyors and analysts could have used to project the answer based on “indirect” questions. Instead of simply asking “Whom are you voting for?”, how about asking what their major concerns were? Combined with modeling techniques, a few innocuous probing questions regarding specific issues — such as environment, gun control, immigration, foreign policy, entitlement programs, etc. — could have led us to much more accurate predictions, reducing the shock factor.

In the middle of all this, I’ve read that artificial intelligence without any human intervention predicted the election outcome correctly, by using abundant data coming out of social media. That means machines are already outperforming human analysts. It helps that machines have no opinions or feelings about the outcome one way or another.

Dystopian Future?

Maybe machine learning will start replacing human analysts and other decision-making professions sooner than expected. That means a disenfranchised population will grow even further, dipping into highly educated demographics. The future, regardless of politics, doesn’t look all that bright for the human collective, if that trend continues.

In the predictive business, there is a price to pay for being wrong. Maybe that is why in some countries, there are complete bans on posting poll numbers and result projections days — sometimes weeks — before the election. Sometimes observation and prediction change behaviors of human subjects, as anthropologists have been documenting for years.

Patients Aren’t Ready for Treatment?

The key is to an effective prescription is to listen to the client first. Why do they lose sleep at night? What are their key success metrics? What are the immediate pain points? What are their long-term goals? And how would we reach there within the limits of provided resources

In my job of being “a guy who finds money-making opportunities using data,” I get to meet all kinds of businesspeople in various industries. Thanks to the business trend around analytics (and to that infamous “Big Data” fad), I don’t have to spend a long time explaining what I do any more; I just say I am in the field of analytics, or to sound a bit fancier, I say data science. Then most marketers seem to understand where the conversation will go from there. Things are never that simple in real life, though, as there are many types of analytics — business intelligence, descriptive analytics, predictive analytics, optimization, forecasting, etc., even at a high level — but figuring what type of solutions should be prescribed is THE job for a consultant, anyway (refer to “Prescriptive Analytics at All Stages”).

The key is to an effective prescription is to listen to the client first. Why do they lose sleep at night? What are their key success metrics? What are the immediate pain points? What are their long-term goals? And how would we reach there within the limits of provided resources and put out the fire at the same time? Building a sound data and analytics roadmap is critical, as no one wants to have an “Oh dang, we should have done that a year ago!” moment after a complex data project is well on its way. Reconstruction in any line of business is costly, and unfortunately, it happens all of the time, as many marketers and decision-makers often jump into the data pool out of desperation under organizational pressure (or under false promises by toolset providers, as in “all your dreams will come true with this piece of technology”). It is a sad sight when users realize that they don’t know how to swim in it “after” they jumped into it.

Why does that happen all of the time? At the risk of sounding like a pompous doctor, I must say that it is quite often the patient’s fault, too; there are lots of bad patients. When it comes to the data and analytics business, not all marketers are experts in it, though some are. Most do have a mid-level understanding, and they actually know when to call in for help. And there are complete novices, too. Now, regardless of their understanding level, bad patients are the ones who show up with self-prescribed solutions, and wouldn’t hear about any other options or precautions. Once, I’ve even met a client who demanded a neural-net model right after we exchanged pleasantries. My response? “Whoa, hold your horses for a minute here, why do you think that you need one?” (Though I didn’t quite say it like that.) Maybe you just came back from some expensive analytics conference, but can we talk about your business case first? After that conversation, I could understand why doctors wouldn’t appreciate patients who would trust WebMD over living, breathing doctors who are in front of them.

Then there are opposite types of cases, too. Some marketers are so insecure about the state of their data assets (or their level of understanding) that they wouldn’t even want to hear about any solutions that sound even remotely complex or difficult, although they may be in desperate need of them. A typical response is something like “Our datasets are so messy that we can’t possibly entertain anything statistical.” You know what that sounds like? It sounds like a patient refusing any surgical treatment in an ER because “he” is not ready for it. No, doctors should be ready to perform the surgery, not the patient.

Messy datasets are surely no excuse for not taking the right path. If we had to wait for a perfect set of data all of the time, there wouldn’t be any need for statisticians or data scientists. In fact, we need such specialists precisely because most data sets are messy and incomplete, and they need to be enhanced by statistical techniques.

Analytics is about making the best of what we have. Cleaning dirty and messy data is part of the job, and should never be an excuse for not doing the right thing. If anyone assumes that simple reports don’t require data cleansing steps because the results look simple, nothing could be further from the truth. Most reporting errors stem from dirty data, and most datasets — big or small, new or old — are not ready to be just plugged into analytical engines.

Besides, different types of analytics are needed because there are so many variations of business challenges, and no analytics is supposed to happen in some preset order. In other words, we get into predictive modeling because the business calls for it, not because a marketer finished some basic Reporting 101 class and now wants to move onto an Analytics 202 course. I often argue that deriving insights out of a series of simple reports could be a lot more difficult than building models or complex data management. Conversely, regardless of the sophistication level, marketers are not supposed to get into advanced analytics just for intellectual curiosity. Every data and analytics activity must be justified with business purposes, carefully following the strategic data roadmap, not difficulty level of the task.

Customer Value: Narrowcasting vs. Broadcasting

The traditional model for customer acquisition has essentially been a broadcast approach, reaching a large audience generally descriptive of the customer base. Contrast this with what is sometimes described as “narrowcasting.”

Virtually every brand we’ve met with in the last few months is hungry for new customers: The war for the customer is on. For more on growing your customer base, consider reading “Bigger is Better: How to Scale Up Customer Acquisition Smarter,” which is an article we published recently about how to grow your customer base.

Many organizations are hooked on customer acquisition. That is, in order to hit sales plans for the organization, new customers will be required in large numbers. It’s about as easy to kick the “acquisition addiction” as it is to kick any other for most brands. Try going without coffee suddenly, and see how your head feels. It’s not very different from reducing a business’s dependence on customer acquisition as a means to achieving revenue and profit targets.

Organizations that need ever larger numbers of new customers to achieve growth goals eventually will find the cost of acquiring incremental net new customers can become prohibitive.

Broadcast vs. Narrowcast
The traditional model for advertising and customer acquisition has essentially been a broadcast approach, reaching a large audience that is generally descriptive of the customer who a brand believes to be a fit. Contrast this with what is sometimes described as a “narrowcasting” strategy. Narrowcasting uses customer intelligence to understand a great number of discrete dimensions that a consumer possesses and can leverage statistical methods to validate the accuracy and predictiveness of targeting customers through these methods.

The chart below, depicting the value of customers acquired through traditional broadcast capabilities upfront and over time helps illustrate why “broadcast” strategies for customer acquisition alone aren’t enough.

Research for Mike Ferranti blog

Broadcast Acquisition Strategies Lack Focus on Customer Value
Large numbers of customers have been acquired in a trailing 13-month window – lots of them. The challenge is this cohort of customers has been acquired without adequate consideration of the right target.

Consider the fact that the target customer value of average or better customers is around $500. In the example above, the marketer has acquired a large number of customers who are lagging in their economic contribution to the business. While the customer acquisition metrics may look good, this was a large campaign and produced several hundreds of thousands of customers over its duration – the average value of those customers is quite low indeed.

Low Customer Value Manifests Itself, Even if Acquisition Volume Is High
When sales targets are rising, it becomes harder to justify the high cost of customer acquisition if the customers previously acquired are underperforming. This leads to a very common bind marketers are placed in. The only way to “make the number” is to acquire more and more.

The most competitive and high quality businesses steadily acquire and have a robust customer base whose economic contribution is materially higher. Consequently, profits are higher, and we have a fundamentally better business.

Oftentimes, “broadcast” advertising approaches define the target with a single criteria like age, income or geography. This can be effective, especially when the media is bought at a good value. However, “effective” is almost always defined as “number of customers acquired.” This of course is a reasonable way to judge the performance of the marketing – at least by traditional standards.

There is another way to measure the success of the campaign that is only just beginning to be understood by many traditional “broadcast” marketers: customer value. The chart above shows that this cohort of acquired customers had relatively low economic value.

Root Causes of Low Customer Value
What are the causes of low value? It would be fair to start with the ongoing marketing and relationship with the customer. Bad service could keep customers from returning. Poor quality could lead to excessive returns. Over-promotion could drive down value. Getting the message and frequency wrong could lead to underperformance of the cohort. These are all viable reasons for lower value that need to be rationally and methodically ruled out prior to looking elsewhere.

Therefore, if operational issues are not clear – either through organizational KPI tracking, or simply by monitoring Twitter — then a marketing professional needs to start looking at three things.

  1. The Target (and Media)
  2. The Offer (and Message)
  3. The Creative

Given the target is historically responsible for up to 70 percent of the success of advertising, this is the first place a professional data-driven marketer would look.

Target Definition Defines the Customer You Acquire, and It Drives Customer Value.
A fact that is often overlooked is that target definition means not just focusing efforts and advertising spent on consumers who are most likely to convert and become customers, but it also defines what kind of customers they have the potential to become.

In conversations with CMOs, we often discuss “the target customer” or the “ideal customer” they wish to introduce their brand to. The descriptions of course vary by the brand and the product. Those target definitions are often more qualitative in nature. In fact, only about 30 percent of CMO’s we engage with regularly are focused on using hard data to define their customer base. While these are helpful and create a vocabulary for discussing and defining who the customer is, those primarily qualitative descriptors are often sculpted to align with media descriptors that make targeting “big and simple.”

“While simplifying is good business, when simplicity masks underlying business model challenges, a deeper look will ultimately be required, if not forced on the organization.”

While we would not refute a place for those descriptors of a valued consumer, they do fall short of true target definition. Ideally, the process of defining the customer who a brand wishes to pursue must begin with a thorough inventory of the customers it already has, and a substantial enhancement of those customer records which provides vibrant metrics on affluence, age, ethnographic, urbanicity, purchasing behaviors, credit history, geo- and demo-graphics, net worth, income, online purchasing, offline purchasing and potentially a great deal more.

Perspectives Matter in Analytics

When we observe a certain phenomenon, we should never do so from just one angle. We’ve all heard the fable about blind men and an elephant, where each touched just one part of the animal and exclaimed, “Hey, this creature must be like a snake!” and “No, it feels like a thick column!” or “I’m sure it is like a big wall!” We certainly don’t want to fall into that trap.

When we observe a certain phenomenon, we should never do so from just one angle. We’ve all heard the fable about blind men and an elephant, where each touched just one part of the animal and exclaimed, “Hey, this creature must be like a snake!” and “No, it feels like a thick column!” or “I’m sure it is like a big wall!” We certainly don’t want to fall into that trap.

In the world of marketing, however, so many jump to conclusions with limited information from one perspective. Further, some even fool themselves into thinking that they made scientific conclusions because they employed data mining techniques. Unfortunately, just quoting numbers does not automatically make anyone more analytical, as numbers live within contexts. With all these easy-to-use visualization tools, it’s equally easy to misrepresent the figures, as well.

When we try to predict the future – even the near future – things get even more complicated. It is hard enough to master the mathematical part of predictive analytics, but it gets harder when the data sources are seriously limited; or worse, skewed. When the data sources are contaminated with external factors other than consumer behavior, we may end up predicting the outcome based on the marketer’s action, not on consumer behaviors.

That is why procuring and employing multiple sources of data are so important in predictive analytics. Even when the mission is to just observe what is happening in the world, having multiple perspectives is essential. Simply, who would mind the bird’s-eye view when reporting a high-speed car chase on TV news? It certainly enhances the picture. On the other hand, you would not feel the urgency on the ground without the camera installed on a police car.

I frequently drive from New Jersey to New York City during rush hour. (I have my reasons.) I have been tracking the number of minutes in driving time between every major turn. Not that it helps much in reducing overall commuting time, as there isn’t much I can do when sitting helplessly on a bridge. But I can predict the arrival time with reasonable accuracy. Now armed with smartphone apps that collect such data from everyone with the same applications (crowd sourcing at its best), we can predict ETA to any destination with a margin of error narrower than a minute. That is great when I’m sitting in the car already. But do such analytics help me make decisions about whether I should have been in the car in the first place that morning? While it is great to have a navigator that tells me every turn that I should make, do all that data tell me if going to the city on the first day of school in September is the right decision? Hardly. I need a different perspective for that type of decision.

Every type of data and analytics has its place, and none are almighty. Marketers literally track every breath you take and every move you make when it comes to online activities. So-called “analytical solution providers” are making fortunes collecting data and analyzing them. Clickstream data are the major reasons why data got so big; and, thanks to them, we started using the term “Big Data.” It is very difficult to navigate through this complex world, so marketers spend a great amount of time and resources to figure out where they stand. Weekly reports that come out of such data are easily hundreds of pages (figuratively), and before marketers get to understand all those figures, a new set of reports lands on their laps (again, metaphorically). It is like having to look at the dashboard of a car without a break when driving it at full speed. Such a cycle continues, and the analysts get into a perpetual motion of pumping out reports.

I am not discounting the value of such reporting at all. When a rocket ship is being launched, literally hundreds of people look at their screens all simultaneously just to see how the process is going. However, if the rocket ship is in trouble, there isn’t much one can do by looking at the numbers other than, “Uh-oh, based on these figures, we have a serious engine problem right now.” And such reporting certainly does not tell anyone whether one should have launched the vehicle at that particular moment in time with that pre-set destination. Such analytics are completely different from analyzing every turn when moving at a full speed.

Marketers get lost because they look at the given sets of numbers looking for answers, while the metrics and reports are designed for some other purpose. At times, we need to change the perspective completely. For instance, looking at every click will not provide accurate sales projections on a personal or product level. Once in a while it may be correct, but such predictions can easily be thrown off with a slight jolt in the system. It gets worse when there is no direct correlation between clicks and conversions; as such things are heavily dependent upon business models and the site design (i.e., actions of marketers, not buyers).

As I emphasized numerous times in this series, analytical questions must be formed based on business questions, not the other way around. But too often, marketers seek to find answers to their questions within the limited data and reports they get to see. It is not impossible to gauge the speed of your vehicle based on the shape of the fur of your dog who is sticking his head out the window, but I wouldn’t recommend using that method when the goal is to estimate time of arrival with a margin of error of less than a minute.

Not all analytics are the same, and different types of analytical objectives call for different types of data, big and small. To understand your surroundings, yes, you need some serious business intelligence with carefully designed dashboards, real-time or otherwise. To predict the future outcome, or to fill in the blanks (as there are lots of unknown factors, even in the age of Big Data), we must change the perspective and harness different sets of data. To determine the overall destination, we need yet another types of analytics at a macro-level.

In the world of predictive analytics, predicting price elasticity, market trends or specific consumer behaviors all call for different types of data, techniques and specialists. Just within the realm of predicting consumer behavior, there lie different levels of difficulties. At the risk of sounding too simplistic, I would say predicting “who” is relatively easier than predicting “what product.” Predicting “when” is harder than those two things combined, as you may be able to predict “who” would be in the market for a “luxury vacation” with some confidence, but predicting “when” that person would actually purchase cruise ship tickets requires a different type of data, which is really hard to obtain with any consistency. The hardest one is predicting “why” people behave one way or the other. Let’s just say marketers need to listen to anyone who claims that they can do that with a grain of salt. We may need to get into a deep discussion regarding “causality” and “correlation” at that point.

Even that relatively simple “who” part of prediction calls for some debate, with all kinds of data being pumped out every second. Some marketers employ data and toolsets based on availability and price alone, but let us step back for a second and look at it from a different perspective.

Hypothetically speaking, let’s assume we as marketers get to choose one superpower to predict who is more likely to buy your product at a mall, so that you can address your prospects properly (i.e., by delivering personalized messages properly). Your choices are:

  • You get to install a camera on everyone’s shoulder at the entrance of the mall
  • You get to have everyone’s past transaction history on an SKU level (who, when, for how much and for what product)

The choice behind Door No. 1 offers what we generally call clickstream data, which falls into the realm of Big Data. It will record literally every move that everyone makes with a time stamp. The second choice is good old transaction data on a product level, and you may call it small data; though in this day and age, there is nothing so small about it. It is just relatively smaller in size in comparison to No. 1. Now, if your goal is to design the mall to optimize traffic patterns for sales, you surely need to pick No. 1. If your goal were to predict who is more likely to buy your product, I would definitely go with No. 2. Yes, some lady may be looking at shoes very frequently, but will she really make a purchase in that category? What does her personal transaction history say?

In reality, we may have to work just with No. 1, but if I had a choice in this hypothetical situation, I would opt for transaction data any time. In my co-op data business days, I looked through about 50 model documents per day for more than six years, and I have seen the predictive power of transaction data firsthand. If you can achieve accurate answers with smaller sets of data, why would you pick any reroute?

Of course in real life, I would like to have both. Because more varieties of data – not just these choices, but also demographic, geo-demographic, sentiment and attitudinal data, as well – will help you zoom into the target with greater accuracy, consistency and efficiency. In this example, if the potential customer is new to the mall, or has been dormant for a long time, you may have to work with just cameras-on-shoulders data. But such a judgment should be made during the course of analytics, and should not be predetermined by marketers or IT folks before the analysis begins.

Not all datasets are created equal, and we need all kinds of data. Each set of data comes with tons of holes in it, and we need to fill such gaps with data from other sources, from different angles. Too often, marketers get too deep into the rabbit hole simply because they have been digging it for a long time. But once in a while, we all need to stick our heads out of the hole and have a different perspective.

Digging a hole to a wrong direction will not make anyone richer, and you will never see the end of it while you’re in it.

How to Outsource Analytics

In this series, I have been emphasizing the importance of statistical modeling in almost every article. While there are plenty of benefits of using statistical models in a more traditional sense (refer to “Why Model?”), in the days when “too much” data is the main challenge, I would dare to say that the most important function of statistical models is that they summarize complex data into simple-to-use “scores.”

In this series, I have been emphasizing the importance of statistical modeling in almost every article. While there are plenty of benefits of using statistical models in a more traditional sense (refer to “Why Model?”), in the days when “too much” data is the main challenge, I would dare to say that the most important function of statistical models is that they summarize complex data into simple-to-use “scores.”

The next important feature would be that models fill in the gaps, transforming “unknowns” to “potentials.” You see, even in the age of ubiquitous data, no one will ever know everything about everybody. For instance, out of 100,000 people you have permission to contact, only a fraction will be “known” wine enthusiasts. With modeling, we can assign scores for “likelihood of being a wine enthusiast” to everyone in the base. Sure, models are not 100 percent accurate, but I’ll take “70 percent chance of afternoon shower” over not knowing the weather forecast for the day of the company picnic.

I’ve already explained other benefits of modeling in detail earlier in this series, but if I may cut it really short, models will help marketers:

1. In deciding whom to engage, as they cannot afford to spam the world and annoy everyone who can read, and

2. In determining what to offer once they decide to engage someone, as consumers are savvier than ever and they will ignore and discard any irrelevant message, no matter how good it may look.

OK, then. I hope you are sold on this idea by now. The next question is, who is going to do all that mathematical work? In a country where jocks rule over geeks, it is clear to me that many folks are more afraid of mathematics than public speaking; which, in its own right, ranks higher than death in terms of the fear factor for many people. If I may paraphrase “Seinfeld,” many folks are figuratively more afraid of giving a eulogy than being in the coffin at a funeral. And thanks to a sub-par math education in the U.S. (and I am not joking about this, having graduated high school on foreign soil), yes, the fear of math tops them all. Scary, heh?

But that’s OK. This is a big world, and there are plenty of people who are really good at mathematics and statistics. That is why I purposefully never got into the mechanics of modeling techniques and related programming issues in this series. Instead, I have been emphasizing how to formulate questions, how to express business goals in a more logical fashion and where to invest to create analytics-ready environments. Then the next question is, “How will you find the right math geeks who can make all your dreams come true?”

If you have a plan to create an internal analytics team, there are a few things to consider before committing to that idea. Too many organizations just hire one or two statisticians, dump all the raw data onto them, and hope to God that they will figure some ways to make money with data, somehow. Good luck with that idea, as:

1. I’ve seen so many failed attempts like that (actually, I’d be shocked if it actually worked), and

2. I am sure God doesn’t micromanage statistical units.

(Similarly, I am almost certain that she doesn’t care much for football or baseball scores of certain teams, either. You don’t think God cares more for the Red Sox than the Yankees, do ya?)

The first challenge is locating good candidates. If you post any online ad for “Statistical Analysts,” you will receive a few hundred resumes per day. But the hiring process is not that simple, as you should ask the right questions to figure out who is a real deal, and who is a poser (and there are many posers out there). Even among qualified candidates with ample statistical knowledge, there are differences between the “Doers” and “Vendor Managers.” Depending on your organizational goal, you must differentiate the two.

Then the next challenge is keeping the team intact. In general, mathematicians and statisticians are not solely motivated by money; they also want constant challenges. Like any smart and creative folks, they will simply pack up and leave, if “they” determine that the job is boring. Just a couple of modeling projects a year with some rudimentary sets of data? Meh. Boring! Promises of upward mobility only work for a fraction of them, as the majority would rather deal with numbers and figures, showing no interest in managing other human beings. So, coming up with interesting and challenging projects, which will also benefit the whole organization, becomes a job in itself. If there are not enough challenges, smart ones will quit on you first. Then they need constant mentoring, as even the smartest statisticians will not know everything about challenges associated with marketing, target audiences and the business world, in general. (If you stumble into a statistician who is even remotely curious about how her salary is paid for, start with her.)

Further, you would need to invest to set up an analytical environment, as well. That includes software, hardware and other supporting staff. Toolsets are becoming much cheaper, but they are not exactly free yet. In fact, some famous statistical software, such as SAS, could be quite expensive year after year, although there are plenty of alternatives now. And they need an “analytics-ready” data environment, as I emphasized countless times in this series (refer to “Chicken or the Egg? Data or Analytics?” and “Marketing and IT; Cats and Dogs”). Such data preparation work is not for statisticians, and most of them are not even good at cleaning up dirty data, anyway. That means you will need different types of developers/programmers on the analytics team. I pointed out that analytical projects call for a cohesive team, not some super-duper analyst who can do it all (refer to “How to Be a Good Data Scientist”).

By now you would say “Jeez Louise, enough already,” as all this is just too much to manage to build just a few models. Suddenly, outsourcing may sound like a great idea. Then you would realize there are many things to consider when outsourcing analytical work.

First, where would you go? Everyone in the data industry and their cousins claim that they can take care of analytics. But in reality, it is a scary place where many who have “analytics” in their taglines do not even touch “predictive analytics.”

Analytics is a word that is abused as much as “Big Data,” so we really need to differentiate them. “Analytics” may mean:

  • Business Intelligence (BI) Reporting: This is mostly about the present, such as the display of key success metrics and dashboard reporting. While it is very important to know about the current state of business, much of so-called “analytics” unfortunately stops right here. Yes, it is good to have a dashboard in your car now, but do you know where you should be going?
  • Descriptive Analytics: This is about how the targets “look.” Common techniques such as profiling, segmentation and clustering fall under this category. These techniques are mainly for describing the target audience to enhance and optimize messages to them. But using these segments as a selection mechanism is not recommended, while many dare to do exactly that (more on this subject in future articles).
  • Predictive Modeling: This is about answering the questions about the future. Who would be more likely to behave certain ways? What communication channels will be most effective for whom? How much is the potential spending level of a prospect? Who is more likely to be a loyal and profitable customer? What are their preferences? Response models, various of types of cloning models, value models, and revenue models, attrition models, etc. all fall under this category, and they require hardcore statistical skills. Plus, as I emphasized earlier, these model scores compact large amounts of complex data into nice bite-size packages.
  • Optimization: This is mostly about budget allocation and attribution. Marketing agencies (or media buyers) generally deal with channel optimization and spending analysis, at times using econometrics models. This type of statistical work calls for different types of expertise, but many still insist on calling it simply “analytics.”

Let’s say that for the purpose of customer-level targeting and personalization, we decided to outsource the “predictive” modeling projects. What are our options?

We may consider:

  • Individual Consultants: In-house consultants are dedicated to your business for the duration of the contract, guaranteeing full access like an employee. But they are there for you only temporarily, with one foot out the door all the time. And when they do leave, all the knowledge walks away with them. Depending on the rate, the costs can add up.
  • Standalone Analytical Service Providers: Analytical work is all they do, so you get focused professionals with broad technical and institutional knowledge. Many of them are entrepreneurs, but that may work against you, as they could often be understaffed and stretched thin. They also tend to charge for every little step, with not many freebies. They are generally open to use any type of data, but the majority of them do not have secure sources of third-party data, which could be essential for certain types of analytics involving prospecting.
  • Database Service Providers: Almost all data compilers and brokers have statistical units, as they need to fill in the gap within their data assets with statistical techniques. (You didn’t think that they knew everyone’s income or age, did you?) For that reason, they have deep knowledge in all types of data, as well as in many industry verticals. They provide a one-stop shop environment with deep resource pools and a variety of data processing capabilities. However, they may not be as agile as smaller analytical shops, and analytics units may be tucked away somewhere within large and complex organizations. They also tend to emphasize the use of their own data, as after all, their main cash cows are their data assets.
  • Direct Marketing Agencies: Agencies are very strategic, as they touch all aspects of marketing and control creative processes through segmentation. Many large agencies boast full-scale analytical units, capable of all types of analytics that I explained earlier. But some agencies have very small teams, stretched really thin—just barely handling the reporting aspect, not any advanced analytics. Some just admit that predictive analytics is not part of their core competencies, and they may outsource such projects (not that it is a bad thing).

As you can see here, there is no clear-cut answer to “with whom you should you work.” Basically, you will need to check out all types of analysts and service providers to determine the partner best suitable for your long- and short-term business purposes, not just analytical goals. Often, many marketers just go with the lowest bidder. But pricing is just one of many elements to be considered. Here, allow me to introduce “10 Essential Items to Consider When Outsourcing Analytics.”

1. Consulting Capabilities: I put this on the top of the list, as being a translator between the marketing and the technology world is the most important differentiator (refer to “How to Be a Good Data Scientist”). They must understand the business goals and marketing needs, prescribe suitable solutions, convert such goals into mathematical expressions and define targets, making the best of available data. If they lack strategic vision to set up the data roadmap, statistical knowledge alone will not be enough to achieve the goals. And such business goals vary greatly depending on the industry, channel usage and related success metrics. Good consultants always ask questions first, while sub-par ones will try to force-fit marketers’ goals into their toolsets and methodologies.

Translating marketing goals into specific courses of action is a skill in itself. A good analytical partner should be capable of building a data roadmap (not just statistical steps) with a deep understanding of the business impact of resultant models. They should be able to break down larger goals into smaller steps, creating proper phased approaches. The plan may call for multiple models, all kinds of pre- and post-selection rules, or even external data acquisition, while remaining sensitive to overall costs.

The target definition is the core of all these considerations, which requires years of experience and industry knowledge. Simply, the wrong or inadequate targeting decision leads to disastrous results, no matter how sound the mathematical work is (refer to “Art of Targeting”).

Another important quality of a good analytical partner is the ability to create usefulness out of seemingly chaotic and unstructured data environments. Modeling is not about waiting for the perfect set of data, but about making the best of available data. In many modeling bake-offs, the winners are often decided by the creative usage of provided data, not just statistical techniques.

Finally, the consultative approach is important, as models do not exist in a vacuum, but they have to fit into the marketing engine. Be aware of the ones who want to change the world around their precious algorithms, as they are geeks not strategists. And the ones who understand the entire marketing cycle will give advice on what the next phase should be, as marketing efforts must be perpetual, not transient.

So, how will you find consultants? Ask the following questions:

  • Are they “listening” to you?
  • Can they repeat “your” goals in their own words?
  • Do their roadmaps cover both short- and long-term goals?
  • Are they confident enough to correct you?
  • Do they understand “non-statistical” elements in marketing?
  • Have they “been there, done that” for real, or just in theories?

2. Data Processing Capabilities: I know that some people look down upon the word “processing.” But data manipulation is the most important key step “before” any type of advanced analytics even begins. Simply, “garbage-in, garbage out.” And unfortunately, most datasets are completely unsuitable for analytics and modeling. In general, easily more than 80 percent of model development time goes into “fixing” the data, as most are unstructured and unrefined. I have been repeatedly emphasizing the importance of a “model-ready” (or “analytics-ready”) environment for that reason.

However, the reality dictates that the majority of databases are indeed NOT model-ready, and most of them are not even close to it. Well, someone has to clean up the mess. And in this data business, the last one who touches the dataset becomes responsible for all the errors and mistakes made to it thus far. I know it is not fair, but that is why we need to look at the potential partner’s ability to handle large and really messy data, not just the statistical savviness displayed in glossy presentations.

Yes, that dirty work includes data conversion, edit/hygiene, categorization/tagging, data summarization and variable creation, encompassing all kinds of numeric, character and freeform data (refer to “Beyond RFM Data” and “Freeform Data Aren’t Exactly Free”). It is not the most glorious part of this business, but data consistency is the key to successful implementation of any advanced analytics. So, if a model-ready environment is not available, someone had better know how to make the best of whatever is given. I have seen too many meltdowns in “before” and “after” modeling steps due to inconsistencies in databases.

So, grill the candidates with the following questions:

  • If they support file conversions, edit, categorization and summarization
  • How big of a dataset is too big, and how many files/tables are too many for them
  • How much free-form data are too much for them
  • Ask for sample model variables that they have created in the past

3. Track Records in the Industry: It can be argued that industry knowledge is even more crucial for the success than statistical know-how, as nuances are often “Lost in Translation” without relevant industry experience. In fact, some may not even be able to carry on a proper conversation with a client without it, leading to all kinds of wrong assumptions. I have seen a case where “real” rocket scientists messed up models for credit card campaigns.

The No. 1 reason why industry experience is important is everyone’s success metrics are unique. Just to name a few, financial services (banking, credit card, insurance, investment, etc.), travel and hospitality, entertainment, packaged goods, online and offline retail, catalogs, publication, telecommunications/utilities, non-profit and political organizations all call for different types of analytics and models, as their business models and the way they interact with target audiences are vastly different. For example, building a model (or a database, for that matter) for businesses where they hand over merchandise “before” they collect money is fundamentally different than the ones where exchange happens simultaneously. Even a simple concept of payment date or transaction date cannot be treated the same way. For retailers, recent dates could be better for business, but for subscription business, older dates may carry more weight. And these are just some examples with “dates,” before touching any dollar figures or other fun stuff.

Then the job gets even more complicated, if we further divide all of these industries by B-to-B vs. B-to-C, where available data do not even look similar. On top of that, divisional ROI metrics may be completely different, and even terminology and culture may play a role in all of this. When you are a consultant, you really don’t want to stop the flow of a meeting to clarify some unfamiliar acronyms, as you are supposed to know them all.

So, always demand specific industry references and examine client roasters, if allowed. (Many clients specifically ask vendors not to use their names as references.) Basically, watch out for the ones who push one-size-fits-all cookie-cutter solutions. You deserve way more than that.

4. Types of Models Supported: Speaking of cookie-cutter stuff, we need to be concerned with types of models that the outsourcing partner would support. Sure, nobody employs every technique, and no one can be good at everything. But we need to watch out for the “One-trick Ponies.”

This could be a tricky issue, as we are going into a more technical domain. Plus, marketers should not self-prescribe with specific techniques, instead of clearly stating their business goals (refer to “Marketing and IT; Cats and Dogs”). Some of the modeling goals are:

  • Rank and select prospect names
  • Lead scoring
  • Cross-sell/upsell
  • Segment the universe for messaging strategy
  • Pinpoint the attrition point
  • Assign lifetime values for prospects and customers
  • Optimize media/channel spending
  • Create new product packages
  • Detect fraud
  • Etc.

Unless you have successfully dealt with the outsourcing partner in the past (or you have a degree in statistics), do not blurt out words like Neural-net, CHAID, Cluster Analysis, Multiple Regression, Discriminant Function Analysis, etc. That would be like demanding specific medication before your new doctor even asks about your symptoms. The key is meeting your business goals, not fulfilling buzzwords. Let them present their methodology “after” the goal discussion. Nevertheless, see if the potential partner is pushing one or two specific techniques or solutions all the time.

5. Speed of Execution: In modern marketing, speed to action is the king. Speed wins, and speed gains respect. However, when it comes to modeling or other advanced analytics, you may be shocked by the wide range of time estimates provided by each outsourcing vendor. To be fair they are covering themselves, mainly because they have no idea what kind of messy data they will receive. As I mentioned earlier, pre-model data preparation and manipulation are critical components, and they are the most time-consuming part of all; especially when available data are in bad shape. Post-model scoring, audit and usage support may elongate the timeline. The key is to differentiate such pre- and post-modeling processes in the time estimate.

Even for pure modeling elements, time estimates vary greatly, depending on the complexity of assignments. Surely, a simple cloning model with basic demographic data would be much easier to execute than the ones that involve ample amounts of transaction- and event-level data, coming from all types of channels. If time-series elements are added, it will definitely be more complex. Typical clustering work is known to take longer than regression models with clear target definitions. If multiple models are required for the project, it will obviously take more time to finish the whole job.

Now, the interesting thing about building a model is that analysts don’t really finish it, but they just run out of time—much like the way marketers work on PowerPoint presentations. The commonality is that we can basically tweak models or decks forever, but we have to stop at some point.

However, with all kinds of automated tools and macros, model development time has decreased dramatically in past decades. We really came a long way since the first application of statistical techniques to marketing, and no one should be quoting a 1980s timeline in this century. But some still do. I know vendors are trained to follow the guideline “always under-promise and over-deliver,” but still.

An interesting aspect of this dilemma is that we can negotiate the timeline by asking for simpler and less sophisticated versions with diminished accuracy. If, hypothetically, it takes a week to be 98 percent accurate, but it only takes a day to be 90 percent accurate, what would you pick? That should be the business decision.

So, what is a general guideline? Again, it really depends on many factors, but allow me to share a version of it:

  • Pre-modeling Processing

– Data Conversions: from half a day to weeks

– Data Append/Enhancement: between overnight and two days

– Data Edit and Summarization: Data-dependent

  • Modeling: Ranges from half a day to weeks

– Depends on type, number of models and complexity

  • Scoring: from half a day to one week

– Mainly depends on number of records and state of the database to be scored

I know these are wide ranges, but watch out for the ones that routinely quote 30 days or more for simple clone models. They may not know what they are doing, or worse, they may be some mathematical perfectionists who don’t understand the marketing needs.

6. Pricing Structure: Some marketers would put this on top of the checklist, or worse, use the pricing factor as the only criterion. Obviously, I disagree. (Full disclosure: I have been on the service side of the fence during my entire career.) Yes, every project must make an economic sense in the end, but the budget should not and cannot be the sole deciding factor in choosing an outsourcing partner. There are many specialists under famous brand names who command top dollars, and then there are many data vendors who throw in “free” models, disrupting the ecosystem. Either way, one should not jump to conclusions too fast, as there is no free lunch, after all. In any case, I strongly recommend that no one should start the meeting with pricing questions (hence, this article). When you get to the pricing part, ask what the price includes, as the analytical journey could be a series of long and winding roads. Some of the biggest factors that need to be considered are:

  • Multiple Model Discounts—Less for second or third models within a project?
  • Pre-developed (off-the-shelf) Models—These can be “much” cheaper than custom models, while not custom-fitted.
  • Acquisition vs. CRM—Employing client-specific variables certainly increases the cost.
  • Regression Models vs. Other Types—At times, types of techniques may affect the price.
  • Clustering and Segmentations—They are generally priced much higher than target-specific models.

Again, it really depends on the complexity factor more than anything else, and the pre- and post-modeling process must be estimated and priced separately. Non-modeling charges often add up fast, and you should ask for unit prices and minimum charges for each step.

Scoring charges in time can be expensive, too, so negotiate for discounts for routine scoring of the same models. Some may offer all-inclusive package pricing for everything. The important thing is that you must be consistent with the checklist when shopping around with multiple candidates.

7. Documentation: When you pay for a custom model (not pre-developed, off-the-shelf ones), you get to own the algorithm. Because algorithms are not tangible items, the knowledge is to be transformed in model documents. Beware of the ones who offer “black-box” solutions with comments like, “Oh, it will work, so trust us.”

Good model documents must include the following, at the minimum:

  • Target and Comparison Universe Definitions: What was the target variable (or “dependent” variable) and how was it defined? How was the comparison universe defined? Was there any “pre-selection” for either of the universes? These are the most important factors in any model—even more than the mechanics of the model itself.
  • List of Variables: What are the “independent” variables? How were they transformed or binned? From where did they originate? Often, these model variables describe the nature of the model, and they should make intuitive sense.
  • Model Algorithms: What is the actual algorithm? What are the assigned weight for each independent variable?
  • Gains Chart: We need to examine potential effectiveness of the model. What are the “gains” for each model group, from top to bottom (e.g., 320 percent gain at the top model group in comparison to the whole universe)? How fast do such gains decrease as we move down the scale? How do the gains factors compare against the validation sample? A graphic representation would be nice, too.

For custom models, it is customary to have a formal model presentation, full documentation and scoring script in designated programming languages. In addition, if client files are provided, ask for a waterfall report that details input and output counts of each step. After the model scoring, it is also customary for the vendor to provide a scored universe count by model group. You will be shocked to find out that many so-called analytical vendors do not provide thorough documentation. Therefore, it is recommended to ask for sample documents upfront.

8. Scoring Validation: Models are built and presented properly, but the job is not done until the models are applied to the universe from which the names are ranked and selected for campaigns. I have seen too many major meltdowns at this stage. Simply, it is one thing to develop models with a few hundred thousand record samples, but it is quite another to apply the algorithm to millions of records. I am not saying that the scoring job always falls onto the developers, as you may have an internal team or a separate vendor for such ongoing processes. But do not let the model developer completely leave the building until everything checks out.

The model should have been validated against the validation sample by then, but live scoring may reveal all kinds of inconsistencies. You may also want to back-test the algorithms with past campaign results, as well. In short, many things go wrong “after” the modeling steps. When I hear customers complaining about models, I often find that the modeling is the only part that was done properly, and “before” and “after” steps were all messed up. Further, even machines misunderstand each other, as any differences in platform or scripting language may cause discrepancies. Or, maybe there was no technical error, but missing values may have caused inconsistencies (refer to “Missing Data Can Be Meaningful”). Nonetheless, the model developers would have the best insight as to what could have gone wrong, so make sure that they are available for questions after models are presented and delivered.

9. Back-end Analysis: Good analytics is all about applying learnings from past campaigns—good or bad—to new iterations of efforts. We often call it “closed-loop marketing—while many marketers often neglect to follow up. Any respectful analytics shop must be aware of it, while they may classify such work separately from modeling or other analytical projects. At the minimum, you need to check out if they even offer such services. In fact, so-called “match-back analysis” is not as simple as just matching campaign files against responders in this omnichannel environment. When many channels are employed at the same time, allocation of credit (i.e., “what worked?”) may call for all kinds of business rules or even dedicated models.

While you are at it, ask for a cheaper version of “canned” reports, as well, as custom back-end analysis can be even more costly than the modeling job itself, over time. Pre-developed reports may not include all the ROI metrics that you’re looking for (e.g., open, clickthrough, conversion rates, plus revenue and orders-per-mailed, per order, per display, per email, per conversion. etc.). So ask for sample reports upfront.

If you start breaking down all these figures by data source, campaign, time series, model group, offer, creative, targeting criteria, channel, ad server, publisher, keywords, etc., it can be unwieldy really fast. So contain yourself, as no one can understand 100-page reports, anyway. See if the analysts can guide you with such planning, as well. Lastly, if you are so into ROI analysis, get ready to share the “cost” side of the equation with the selected partner. Some jobs are on the marketers.

10. Ongoing Support: Models have a finite shelf life, as all kinds of changes happen in the real world. Seasonality may be a factor, or the business model or strategy may have changed. Fluctuations in data availability and quality further complicate the matter. Basically assumptions like “all things being equal” only happen in textbooks, so marketers must plan for periodic review of models and business rules.

A sure sign of trouble is decreasing effectiveness of models. When in doubt, consult the developers and they may recommend a re-fit or complete re-development of models. Quarterly reviews would be ideal, but if the cost becomes an issue, start with 6-month or yearly reviews, but never go past more than a year without any review. Some vendors may offer discounts for redevelopment, so ask for the price quote upfront.

I know this is a long list of things to check, but picking the right partner is very important, as it often becomes a long-term relationship. And you may find it strange that I didn’t even list “technical capabilities” at all. That is because:

1. Many marketers are not equipped to dig deep into the technical realm anyway, and

2. The difference between the most mathematically sound models and the ones from the opposite end of the spectrum is not nearly as critical as other factors I listed in this article.

In other words, even the worst model in the bake-off would be much better than no model, if these other business criterion are well-considered. So, happy shopping with this list, and I hope you find the right partner. Employing analytics is not an option when living in the sea of data.

Don’t Do It Just Because You Can

Don’t do it just because you can. No kidding. … Any geek with moderate coding skills or any overzealous marketer with access to some data can do real damage to real human beings without any superpowers to speak of. Largely, we wouldn’t go so far as calling them permanent damages, but I must say that some marketing messages and practices are really annoying and invasive. Enough to classify them as “junk mail” or “spam.” Yeah, I said that, knowing full-well that those words are forbidden in the industry in which I built my career.

Don’t do it just because you can. No kidding. By the way, I could have gone with Ben Parker’s “With great power comes great responsibility” line, but I didn’t, as it has become an over-quoted cliché. Plus, I’m not much of a fan of “Spiderman.” Actually, I’m kidding this time. (Not the “Spiderman” part, as I’m more of a fan of “Thor.”) But the real reason is any geek with moderate coding skills or any overzealous marketer with access to some data can do real damage to real human beings without any superpowers to speak of. Largely, we wouldn’t go so far as calling them permanent damages, but I must say that some marketing messages and practices are really annoying and invasive. Enough to classify them as “junk mail” or “spam.” Yeah, I said that, knowing full-well that those words are forbidden in the industry in which I built my career.

All jokes aside, I received a call from my mother a few years ago asking me if this “urgent” letter that says her car warranty will expire if she does not act “right now” (along with a few exclamation marks) is something to which she must respond immediately. Many of us by now are impervious to such fake urgencies or outrageous claims (like “You’ve just won $10,000,000!!!”). But I then realized that there still are plenty of folks who would spend their hard-earned dollars based on such misleading messages. What really made me mad, other than the fact that my own mother was involved in that case, was that someone must have actually targeted her based on her age, ethnicity, housing value and, of course, the make and model of her automobile. I’ve been doing this job for too long to be unaware of potential data variables and techniques that must have played a part so that my mother to receive a series of such letters. Basically, some jerk must have created a segment that could be named as “old and gullible.” Without a doubt, this is a classic example of what should not be done just because one can.

One might dismiss it as an isolated case of a questionable practice done by questionable individuals with questionable moral integrity, but can we honestly say that? I, who knows the ins and outs of direct marketing practices quite well, fell into traps more than a few times, where supposedly a one-time order mysteriously turns into a continuity program without my consent, followed by an extremely cumbersome canceling process. Further, when I receive calls or emails from shady merchants with dubious offers, I can very well assume my information changed hands in very suspicious ways, if not through outright illegal routes.

Even without the criminal elements, as data become more ubiquitous and targeting techniques become more precise, an accumulation of seemingly inoffensive actions by innocuous data geeks can cause a big ripple in the offline (i.e., “real”) world. I am sure many of my fellow marketers remember the news about this reputable retail chain a few years ago; that they accurately predicted pregnancy in households based on their product purchase patterns and sent customized marketing messages featuring pregnancy-related products accordingly. Subsequently it became a big controversy, as such a targeted message was the way one particular head of household found out his teenage daughter was indeed pregnant. An unintended consequence? You bet.

I actually saw the presentation of the instigating statisticians in a predictive analytics conference before the whole incident hit the wire. At the time, the presenters were unaware of the consequences of their actions, so they proudly shared employed methodologies with the audience. But when I heard about what they were actually trying to predict, I immediately turned my head to look at the lead statistician in my then-analytical team sitting next to me, and saw that she had a concerned look that I must have had on my face, as well. And our concern was definitely not about the techniques, as we knew how to do the same when provided with similar sets of data. It was about the human consequences that such a prediction could bring, not just to the eventual targets, but also to the predictors and their fellow analysts in the industry who would all be lumped together as evil scientists by the outsiders. In predictive analytics, there is a price for being wrong; and at times, there is a price to pay for being right, too. Like I said, we shouldn’t do things just because we can.

Analysts do not have superpowers individually, but when technology and ample amounts of data are conjoined, the results can be quite influential and powerful, much like the way bombs can be built with common materials available at any hardware store. Ironically, I have been evangelizing that the data and technology should be wielded together to make big and dumb data smaller and smarter all this time. But providing answers to decision-makers in ready-to-be used formats, hence “humanizing” the data, may have its downside, too. Simply, “easy to use” can easily be “easy to abuse.” After all, humans are fallible creatures with ample amounts of greed and ambition. Even without any obvious bad intentions, it is sometimes very difficult to contemplate all angles, especially about those sensitive and squeamish humans.

I talked about the social consequences of the data business last month (refer to “How to Be a Good Data Scientist“), and that is why I emphasized that anyone who is about to get into this data field must possess deep understandings of both technology and human nature. That little sensor in your stomach that tells you “Oh, I have a bad feeling about this” may not come to everyone naturally, but we all need to be equipped with those safeguards like angels on our shoulders.

Hindsight is always 20/20, but apparently, those smart analysts who did that pregnancy prediction only thought about the techniques and the bottom line, but did not consider all the human factors. And they should have. Or, if not them, their manager should have. Or their partners in the marketing department should have. Or their public relations people should have. Heck, “someone” in their organization should have, alright? Just like we do not casually approach a woman on the street who “seems” pregnant and say “You must be pregnant.” Only socially inept people would do that.

People consider certain matters extremely private, in case some data geeks didn’t realize that. If I might add, the same goes for ailments such as erectile dysfunction or constipation, or any other personal business related to body parts that are considered private. Unless you are a doctor in an examining room, don’t say things like “You look old, so you must have hard time having sex, right?” It is already bad enough that we can’t even watch golf tournaments on TV without those commercials that assume that golf fans need help in that department. (By the way, having “two” bathtubs “outside” the house at dusk don’t make any sense either, when the effect of the drug can last for hours for heaven’s sake. Maybe the man lost interest because the tubs were too damn heavy?)

While it may vary from culture to culture, we all have some understanding of social boundaries in casual settings. When you are talking to a complete stranger on a plane ride, for example, you know exactly how much information that you would feel comfortable sharing with that person. And when someone crosses the line, we call that person inappropriate, or “creepy.” Unfortunately, that creepy line is set differently for each person who we encounter (I am sure people like George Clooney or Scarlett Johansson have a really high threshold for what might be considered creepy), but I think we can all agree that such a shady area can be loosely defined at the least. Therefore, when we deal with large amounts of data affecting a great many people, imagine a rather large common area of such creepiness/shadiness, and do not ever cross it. In other words, when in doubt, don’t go for it.

Now, as a lifelong database marketer, I am not advocating some over-the-top privacy zealots either, as most of them do not understand the nature of data work and can’t tell the difference between informed (and mutually beneficial) messages and Big Brother-like nosiness. This targeting business is never about looking up an individual’s record one at a time, but more about finding correlations between users and products and doing some good match-making in mass numbers. In other words, we don’t care what questionable sites anyone visits, and honest data players would not steal or abuse information with bad intent. I heard about waiters who steal credit card numbers from their customers with some swiping devices, but would you condemn the entire restaurant industry for that? Yes, there are thieves in any part of the society, but not all data players are hackers, just like not all waiters are thieves. Statistically speaking, much like flying being the safest from of travel, I can even argue that handing over your physical credit card to a stranger is even more dangerous than entering the credit card number on a website. It looks much worse when things go wrong, as incidents like that affect a great many all at once, just like when a plane crashes.

Years back, I used to frequent a Japanese Restaurant near my office. The owner, who doubled as the head sushi chef, was not a nosy type. So he waited for more than a year to ask me what I did for living. He had never heard anything about database marketing, direct marketing or CRM (no “Big Data” on the horizon at that time). So I had to find a simple way to explain what I do. As a sushi chef with some local reputation, I presumed that he would know personal preferences of many frequently visiting customers (or “high-value customers,” as marketers call them). He may know exactly who likes what kind of fish and types of cuts, who doesn’t like raw shellfish, who is allergic to what, who has less of a tolerance for wasabi or who would indulge in exotic fish roes. When I asked this question, his answer was a simple “yes.” Any diligent sushi chef would care for his or her customers that much. And I said, “Now imagine that you can provide such customized services to millions of people, with the help of computers and collected data.” He immediately understood the benefits of using data and analytics, and murmured “Ah so …”

Now let’s turn the table for a second here. From the customer’s point of view, yes, it is very convenient for me that my favorite sushi chef knows exactly how I like my sushi. Same goes for the local coffee barista who knows how you take your coffee every morning. Such knowledge is clearly mutually beneficial. But what if those business owners or service providers start asking about my personal finances or about my grown daughter in a “creepy” way? I wouldn’t care if they carried the best yellowtail in town or served the best cup of coffee in the world. I would cease all my interaction with them immediately. Sorry, they’ve just crossed that creepy line.

Years ago, I had more than a few chances to sit closely with Lester Wunderman, widely known as “The Father of Direct Marketing,” as the venture called I-Behavior in which I participated as one of the founders actually originated from an idea on a napkin from Lester and his friends. Having previously worked in an agency that still bears his name, and having only seen him behind a podium until I was introduced to him on one cool autumn afternoon in 1999, meeting him at a small round table and exchanging ideas with the master was like an unknown guitar enthusiast having a jam session with Eric Clapton. What was most amazing was that, at the beginning of the dot.com boom, he was completely unfazed about all those new ideas that were flying around at that time, and he was precisely pointing out why most of them would not succeed at all. I do not need to quote the early 21st century history to point out that his prediction was indeed accurate. When everyone was chasing the latest bit of technology for quick bucks, he was at least a decade ahead of all of those young bucks, already thinking about the human side of the equation. Now, I would not reveal his age out of respect, but let’s just say that almost all of the people in his age group would describe occupations of their offspring as “Oh, she just works on a computer all the time …” I can only wish that I will remain that sharp when I am his age.

One day, Wunderman very casually shared a draft of the “Consumer Bill of Rights for Online Engagement” with a small group of people who happened to be in his office. I was one of the lucky souls who heard about his idea firsthand, and I remember feeling that he was spot-on with every point, as usual. I read it again recently just as this Big Data hype is reaching its peak, just like the dot.com boom was moving with a force that could change the world back then. In many ways, such tidal waves do end up changing the world. But lest we forget, such shifts inevitably affect living, breathing human beings along the way. And for any movement guided by technology to sustain its velocity, people who are at the helm of the enabling technology must stay sensitive toward the needs of the rest of the human collective. In short, there is not much to gain by annoying and frustrating the masses.

Allow me to share Lester Wunderman’s “Consumer Bill of Rights for Online Engagement” verbatim, as it appeared in the second edition of his book “Being Direct”:

  1. Tell me clearly who you are and why you are contacting me.
  2. Tell me clearly what you are—or are not—going to do with the information I give.
  3. Don’t pretend that you know me personally. You don’t know me; you know some things about me.
  4. Don’t assume that we have a relationship.
  5. Don’t assume that I want to have a relationship with you.
  6. Make it easy for me to say “yes” and “no.”
  7. When I say “no,” accept that I mean not this, not now.
  8. Help me budget not only my money, but also my TIME.
  9. My time is valuable, don’t waste it.
  10. Make my shopping experience easier.
  11. Don’t communicate with me just because you can.
  12. If you do all of that, maybe we will then have the basis for a relationship!

So, after more than 15 years of the so-called digital revolution, how many of these are we violating almost routinely? Based on the look of my inboxes and sites that I visit, quite a lot and all the time. As I mentioned in my earlier article “The Future of Online is Offline,” I really get offended when even seasoned marketers use terms like “online person.” I do not become an online person simply because I happen to stumble onto some stupid website and forget to uncheck some pre-checked boxes. I am not some casual object at which some email division of a company can shoot to meet their top-down sales projections.

Oh, and good luck with that kind of mindless mass emailing; your base will soon be saturated and you will learn that irrelevant messages are bad for the senders, too. Proof? How is it that the conversion rate of a typical campaign did not increase dramatically during the past 40 years or so? Forget about open or click-through rate, but pay attention to the good-old conversion rate. You know, the one that measures actual sales. Don’t we have superior databases and technologies now? Why is anyone still bragging about mailing “more” in this century? Have you heard about “targeted” or “personalized” messages? Aren’t there lots and lots of toolsets for that?

As the technology advances, it becomes that much easier and faster to offend people. If the majority of data handlers continue to abuse their power, stemming from the data in their custody, the communication channels will soon run dry. Or worse, if abusive practices continue, the whole channel could be shut down by some legislation, as we have witnessed in the downfall of the outbound telemarketing channel. Unfortunately, a few bad apples will make things a lot worse a lot faster, but I see that even reputable companies do things just because they can. All the time, repeatedly.

Furthermore, in this day and age of abundant data, not offending someone or not violating rules aren’t good enough. In fact, to paraphrase comedian Chris Rock, only losers brag about doing things that they are supposed to do in the first place. The direct marketing industry has long been bragging about the self-governing nature of its tightly knit (and often incestuous) network, but as tools get cheaper and sharper by the day, we all need to be even more careful wielding this data weaponry. Because someday soon, we as consumers will be seeing messages everywhere around us, maybe through our retina directly, not just in our inboxes. Personal touch? Yes, in the creepiest way, if done wrong.

Visionaries like Lester Wunderman were concerned about the abusive nature of online communication from the very beginning. We should all read his words again, and think twice about social and human consequences of our actions. Google from its inception encapsulated a similar idea by simply stating its organizational objective as “Don’t be evil.” That does not mean that it will stop pursuing profit or cease to collect data. I think it means that Google will always try to be mindful about the influences of its actions on real people, who may not be in positions to control the data, but instead are on the side of being the subject of data collection.

I am not saying all of this out of some romantic altruism; rather, I am emphasizing the human side of the data business to preserve the forward-momentum of the Big Data movement, while I do not even care for its name. Because I still believe, even from a consumer’s point of view, that a great amount of efficiency could be achieved by using data and technology properly. No one can deny that modern life in general is much more convenient thanks to them. We do not get lost on streets often, we can translate foreign languages on the fly, we can talk to people on the other side of the globe while looking at their faces. We are much better informed about products and services that we care about, we can look up and order anything we want while walking on the street. And heck, we get suggestions before we even think about what we need.

But we can think of many negative effects of data, as well. It goes without saying that the data handlers must protect the data from falling into the wrong hands, which may have criminal intentions. Absolutely. That is like banks having to protect their vaults. Going a few steps further, if marketers want to retain the privilege of having ample amounts of consumer information and use such knowledge for their benefit, do not ever cross that creepy line. If the Consumer’s Bill of Rights is too much for you to retain, just remember this one line: “Don’t be creepy.”

Not All Databases Are Created Equal

Not all databases are created equal. No kidding. That is like saying that not all cars are the same, or not all buildings are the same. But somehow, “judging” databases isn’t so easy. First off, there is no tangible “tire” that you can kick when evaluating databases or data sources. Actually, kicking the tire is quite useless, even when you are inspecting an automobile. Can you really gauge the car’s handling, balance, fuel efficiency, comfort, speed, capacity or reliability based on how it feels when you kick “one” of the tires? I can guarantee that your toes will hurt if you kick it hard enough, and even then you won’t be able to tell the tire pressure within 20 psi. If you really want to evaluate an automobile, you will have to sign some papers and take it out for a spin (well, more than one spin, but you know what I mean). Then, how do we take a database out for a spin? That’s when the tool sets come into play.

Not all databases are created equal. No kidding. That is like saying that not all cars are the same, or not all buildings are the same. But somehow, “judging” databases isn’t so easy. First off, there is no tangible “tire” that you can kick when evaluating databases or data sources. Actually, kicking the tire is quite useless, even when you are inspecting an automobile. Can you really gauge the car’s handling, balance, fuel efficiency, comfort, speed, capacity or reliability based on how it feels when you kick “one” of the tires? I can guarantee that your toes will hurt if you kick it hard enough, and even then you won’t be able to tell the tire pressure within 20 psi. If you really want to evaluate an automobile, you will have to sign some papers and take it out for a spin (well, more than one spin, but you know what I mean). Then, how do we take a database out for a spin? That’s when the tool sets come into play.

However, even when the database in question is attached to analytical, visualization, CRM or drill-down tools, it is not so easy to evaluate it completely, as such practice reveals only a few aspects of a database, hardly all of them. That is because such tools are like window treatments of a building, through which you may look into the database. Imagine a building inspector inspecting a building without ever entering it. Would you respect the opinion of the inspector who just parks his car outside the building, looks into the building through one or two windows, and says, “Hey, we’re good to go”? No way, no sir. No one should judge a book by its cover.

In the age of the Big Data (you should know by now that I am not too fond of that word), everything digitized is considered data. And data reside in databases. And databases are supposed be designed to serve specific purposes, just like buildings and cars are. Although many modern databases are just mindless piles of accumulated data, granted that the database design is decent and functional, we can still imagine many different types of databases depending on the purposes and their contents.

Now, most of the Big Data discussions these days are about the platform, environment, or tool sets. I’m sure you heard or read enough about those, so let me boldly skip all that and their related techie words, such as Hadoop, MongoDB, Pig, Python, MapReduce, Java, SQL, PHP, C++, SAS or anything related to that elusive “cloud.” Instead, allow me to show you the way to evaluate databases—or data sources—from a business point of view.

For businesspeople and decision-makers, it is not about NoSQL vs. RDB; it is just about the usefulness of the data. And the usefulness comes from the overall content and database management practices, not just platforms, tool sets and buzzwords. Yes, tool sets are important, but concert-goers do not care much about the types and brands of musical instruments that are being used; they just care if the music is entertaining or not. Would you be impressed with a mediocre guitarist just because he uses the same brand of guitar that his guitar hero uses? Nope. Likewise, the usefulness of a database is not about the tool sets.

In my past column, titled “Big Data Must Get Smaller,” I explained that there are three major types of data, with which marketers can holistically describe their target audience: (1) Descriptive Data, (2) Transaction/Behavioral Data, and (3) Attitudinal Data. In short, if you have access to all three dimensions of the data spectrum, you will have a more complete portrait of customers and prospects. Because I already went through that subject in-depth, let me just say that such types of data are not the basis of database evaluation here, though the contents should be on top of the checklist to meet business objectives.

In addition, throughout this series, I have been repeatedly emphasizing that the database and analytics management philosophy must originate from business goals. Basically, the business objective must dictate the course for analytics, and databases must be designed and optimized to support such analytical activities. Decision-makers—and all involved parties, for that matter—suffer a great deal when that hierarchy is reversed. And unfortunately, that is the case in many organizations today. Therefore, let me emphasize that the evaluation criteria that I am about to introduce here are all about usefulness for decision-making processes and supporting analytical activities, including predictive analytics.

Let’s start digging into key evaluation criteria for databases. This list would be quite useful when examining internal and external data sources. Even databases managed by professional compilers can be examined through these criteria. The checklist could also be applicable to investors who are about to acquire a company with data assets (as in, “Kick the tire before you buy it.”).

1. Depth
Let’s start with the most obvious one. What kind of information is stored and maintained in the database? What are the dominant data variables in the database, and what is so unique about them? Variety of information matters for sure, and uniqueness is often related to specific business purposes for which databases are designed and created, along the lines of business data, international data, specific types of behavioral data like mobile data, categorical purchase data, lifestyle data, survey data, movement data, etc. Then again, mindless compilation of random data may not be useful for any business, regardless of the size.

Generally, data dictionaries (lack of it is a sure sign of trouble) reveal the depth of the database, but we need to dig deeper, as transaction and behavioral data are much more potent predictors and harder to manage in comparison to demographic and firmographic data, which are very much commoditized already. Likewise, Lifestyle variables that are derived from surveys that may have been conducted a long time ago are far less valuable than actual purchase history data, as what people say they do and what they actually do are two completely different things. (For more details on the types of data, refer to the second half of “Big Data Must Get Smaller.”)

Innovative ideas should not be overlooked, as data packaging is often very important in the age of information overflow. If someone or some company transformed many data points into user-friendly formats using modeling or other statistical techniques (imagine pre-developed categorical models targeting a variety of human behaviors, or pre-packaged segmentation or clustering tools), such effort deserves extra points, for sure. As I emphasized numerous times in this series, data must be refined to provide answers to decision-makers. That is why the sheer size of the database isn’t so impressive, and the depth of the database is not just about the length of the variable list and the number of bytes that go along with it. So, data collectors, impress us—because we’ve seen a lot.

2. Width
No matter how deep the information goes, if the coverage is not wide enough, the database becomes useless. Imagine well-organized, buyer-level POS (Point of Service) data coming from actual stores in “real-time” (though I am sick of this word, as it is also overused). The data go down to SKU-level details and payment methods. Now imagine that the data in question are collected in only two stores—one in Michigan, and the other in Delaware. This, by the way, is not a completely made -p story, and I faced similar cases in the past. Needless to say, we had to make many assumptions that we didn’t want to make in order to make the data useful, somehow. And I must say that it was far from ideal.

Even in the age when data are collected everywhere by every device, no dataset is ever complete (refer to “Missing Data Can Be Meaningful“). The limitations are everywhere. It could be about brand, business footprint, consumer privacy, data ownership, collection methods, technical limitations, distribution of collection devices, and the list goes on. Yes, Apple Pay is making a big splash in the news these days. But would you believe that the data collected only through Apple iPhone can really show the overall consumer trend in the country? Maybe in the future, but not yet. If you can pick only one credit card type to analyze, such as American Express for example, would you think that the result of the study is free from any bias? No siree. We can easily assume that such analysis would skew toward the more affluent population. I am not saying that such analyses are useless. And in fact, they can be quite useful if we understand the limitations of data collection and the nature of the bias. But the point is that the coverage matters.

Further, even within multisource databases in the market, the coverage should be examined variable by variable, simply because some data points are really difficult to obtain even by professional data compilers. For example, any information that crosses between the business and the consumer world is sparsely populated in many cases, and the “occupation” variable remains mostly blank or unknown on the consumer side. Similarly, any data related to young children is difficult or even forbidden to collect, so a seemingly simple variable, such as “number of children,” is left unknown for many households. Automobile data used to be abundant on a household level in the past, but a series of laws made sure that the access to such data is forbidden for many users. Again, don’t be impressed with the existence of some variables in the data menu, but look into it to see “how much” is available.

3. Accuracy
In any scientific analysis, a “false positive” is a dangerous enemy. In fact, they are worse than not having the information at all. Many folks just assume that any data coming out a computer is accurate (as in, “Hey, the computer says so!”). But data are not completely free from human errors.

Sheer accuracy of information is hard to measure, especially when the data sources are unique and rare. And the errors can happen in any stage, from data collection to imputation. If there are other known sources, comparing data from multiple sources is one way to ensure accuracy. Watching out for fluctuations in distributions of important variables from update to update is another good practice.

Nonetheless, the overall quality of the data is not just up to the person or department who manages the database. Yes, in this business, the last person who touches the data is responsible for all the mistakes that were made to it up to that point. However, when the garbage goes in, the garbage comes out. So, when there are errors, everyone who touched the database at any point must share in the burden of guilt.

Recently, I was part of a project that involved data collected from retail stores. We ran all kinds of reports and tallies to check the data, and edited many data values out when we encountered obvious errors. The funniest one that I saw was the first name “Asian” and the last name “Tourist.” As an openly Asian-American person, I was semi-glad that they didn’t put in “Oriental Tourist” (though I still can’t figure out who decided that word is for objects, but not people). We also found names like “No info” or “Not given.” Heck, I saw in the news that this refugee from Afghanistan (he was a translator for the U.S. troops) obtained a new first name as he was granted an entry visa, “Fnu.” That would be short for “First Name Unknown” as the first name in his new passport. Welcome to America, Fnu. Compared to that, “Andolini” becoming “Corleone” on Ellis Island is almost cute.

Data entry errors are everywhere. When I used to deal with data files from banks, I found that many last names were “Ira.” Well, it turned out that it wasn’t really the customers’ last names, but they all happened to have opened “IRA” accounts. Similarly, movie phone numbers like 777-555-1234 are very common. And fictitious names, such as “Mickey Mouse,” or profanities that are not fit to print are abundant, as well. At least fake email addresses can be tested and eliminated easily, and erroneous addresses can be corrected by time-tested routines, too. So, yes, maintaining a clean database is not so easy when people freely enter whatever they feel like. But it is not an impossible task, either.

We can also train employees regarding data entry principles, to a certain degree. (As in, “Do not enter your own email address,” “Do not use bad words,” etc.). But what about user-generated data? Search and kill is the only way to do it, and the job would never end. And the meta-table for fictitious names would grow longer and longer. Maybe we should just add “Thor” and “Sponge Bob” to that Mickey Mouse list, while we’re at it. Yet, dealing with this type of “text” data is the easy part. If the database manager in charge is not lazy, and if there is a bit of a budget allowed for data hygiene routines, one can avoid sending emails to “Dear Asian Tourist.”

Numeric errors are much harder to catch, as numbers do not look wrong to human eyes. That is when comparison to other known sources becomes important. If such examination is not possible on a granular level, then median value and distribution curves should be checked against historical transaction data or known public data sources, such as U.S. Census Data in the case of demographic information.

When it’s about the companies’ own data, follow your instincts and get rid of data that look too good or too bad to be true. We all can afford to lose a few records in our databases, and there is nothing wrong with deleting the “outliers” with extreme values. Erroneous names, like “No Information,” may be attached to a seven-figure lifetime spending sum, and you know that can’t be right.

The main takeaways are: (1) Never trust the data just because someone bothered to store them in computers, and (2) Constantly look for bad data in reports and listings, at times using old-fashioned eye-balling methods. Computers do not know what is “bad,” until we specifically tell them what bad data are. So, don’t give up, and keep at it. And if it’s about someone else’s data, insist on data tallies and data hygiene stats.

4. Recency
Outdated data are really bad for prediction or analysis, and that is a different kind of badness. Many call it a “Data Atrophy” issue, as no matter how fresh and accurate a data point may be today, it will surely deteriorate over time. Yes, data have a finite shelf-life, too. Let’s say that you obtained a piece of information called “Golf Interest” on an individual level. That information could be coming from a survey conducted a long time ago, or some golf equipment purchase data from a while ago. In any case, someone who is attached to that flag may have stopped shopping for new golf equipment, as he doesn’t play much anymore. Without a proper database update and a constant feed of fresh data, irrelevant data will continue to drive our decisions.

The crazy thing is that, the harder it is to obtain certain types of data—such as transaction or behavioral data—the faster they will deteriorate. By nature, transaction or behavioral data are time-sensitive. That is why it is important to install time parameters in databases for behavioral data. If someone purchased a new golf driver, when did he do that? Surely, having bought a golf driver in 2009 (“Hey, time for a new driver!”) is different from having purchased it last May.

So-called “Hot Line Names” literally cease to be hot after two to three months, or in some cases much sooner. The evaporation period maybe different for different product types, as one may stay longer in the market for an automobile than for a new printer. Part of the job of a data scientist is to defer the expiration date of data, finding leads or prospects who are still “warm,” or even “lukewarm,” with available valid data. But no matter how much statistical work goes into making the data “look” fresh, eventually the models will cease to be effective.

For decision-makers who do not make real-time decisions, a real-time database update could be an expensive solution. But the databases must be updated constantly (I mean daily, weekly, monthly or even quarterly). Otherwise, someone will eventually end up making a wrong decision based on outdated data.

5. Consistency
No matter how much effort goes into keeping the database fresh, not all data variables will be updated or filled in consistently. And that is the reality. The interesting thing is that, especially when using them for advanced analytics, we can still provide decent predictions if the data are consistent. It may sound crazy, but even not-so-accurate-data can be used in predictive analytics, if they are “consistently” wrong. Modeling is developing an algorithm that differentiates targets and non-targets, and if the descriptive variables are “consistently” off (or outdated, like census data from five years ago) on both sides, the model can still perform.

Conversely, if there is a huge influx of a new type of data, or any drastic change in data collection or in a business model that supports such data collection, all bets are off. We may end up predicting such changes in business models or in methodologies, not the differences in consumer behavior. And that is one of the worst kinds of errors in the predictive business.

Last month, I talked about dealing with missing data (refer to “Missing Data Can Be Meaningful“), and I mentioned that data can be inferred via various statistical techniques. And such data imputation is OK, as long as it returns consistent values. I have seen so many so-called professionals messing up popular models, like “Household Income,” from update to update. If the inferred values jump dramatically due to changes in the source data, there is no amount of effort that can save the targeting models that employed such variables, short of re-developing them.

That is why a time-series comparison of important variables in databases is so important. Any changes of more than 5 percent in distribution of variables when compared to the previous update should be investigated immediately. If you are dealing with external data vendors, insist on having a distribution report of key variables for every update. Consistency of data is more important in predictive analytics than sheer accuracy of data.

6. Connectivity
As I mentioned earlier, there are many types of data. And the predictive power of data multiplies as different types of data get to be used together. For instance, demographic data, which is quite commoditized, still plays an important role in predictive modeling, even when dominant predictors are behavioral data. It is partly because no one dataset is complete, and because different types of data play different roles in algorithms.

The trouble is that many modern datasets do not share any common matching keys. On the demographic side, we can easily imagine using PII (Personally Identifiable Information), such as name, address, phone number or email address for matching. Now, if we want to add some transaction data to the mix, we would need some match “key” (or a magic decoder ring) by which we can link it to the base records. Unfortunately, many modern databases completely lack PII, right from the data collection stage. The result is that such a data source would remain in a silo. It is not like all is lost in such a situation, as they can still be used for trend analysis. But to employ multisource data for one-to-one targeting, we really need to establish the connection among various data worlds.

Even if the connection cannot be made to household, individual or email levels, I would not give up entirely, as we can still target based on IP addresses, which may lead us to some geographic denominations, such as ZIP codes. I’d take ZIP-level targeting anytime over no targeting at all, even though there are many analytical and summarization steps required for that (more on that subject in future articles).

Not having PII or any hard matchkey is not a complete deal-breaker, but the maneuvering space for analysts and marketers decreases significantly without it. That is why the existence of PII, or even ZIP codes, is the first thing that I check when looking into a new data source. I would like to free them from isolation.

7. Delivery Mechanisms
Users judge databases based on visualization or reporting tool sets that are attached to the database. As I mentioned earlier, that is like judging the entire building based just on the window treatments. But for many users, that is the reality. After all, how would a casual user without programming or statistical background would even “see” the data? Through tool sets, of course.

But that is the only one end of it. There are so many types of platforms and devices, and the data must flow through them all. The important point is that data is useless if it is not in the hands of decision-makers through the device of their choice, at the right time. Such flow can be actualized via API feed, FTP, or good, old-fashioned batch installments, and no database should stay too far away from the decision-makers. In my earlier column, I emphasized that data players must be good at (1) Collection, (2) Refinement, and (3) Delivery (refer to “Big Data is Like Mining Gold for a Watch—Gold Can’t Tell Time“). Delivering the answers to inquirers properly closes one iteration of information flow. And they must continue to flow to the users.

8. User-Friendliness
Even when state-of-the-art (I apologize for using this cliché) visualization, reporting or drill-down tool sets are attached to the database, if the data variables are too complicated or not intuitive, users will get frustrated and eventually move away from it. If that happens after pouring a sick amount of money into any data initiative, that would be a shame. But it happens all the time. In fact, I am not going to name names here, but I saw some ridiculously hard to understand data dictionary from a major data broker in the U.S.; it looked like the data layout was designed for robots by the robots. Please. Data scientists must try to humanize the data.

This whole Big Data movement has a momentum now, and in the interest of not killing it, data players must make every aspect of this data business easy for the users, not harder. Simpler data fields, intuitive variable names, meaningful value sets, pre-packaged variables in forms of answers, and completeness of a data dictionary are not too much to ask after the hard work of developing and maintaining the database.

This is why I insist that data scientists and professionals must be businesspeople first. The developers should never forget that end-users are not trained data experts. And guess what? Even professional analysts would appreciate intuitive variable sets and complete data dictionaries. So, pretty please, with sugar on top, make things easy and simple.

9. Cost
I saved this important item for last for a good reason. Yes, the dollar sign is a very important factor in all business decisions, but it should not be the sole deciding factor when it comes to databases. That means CFOs should not dictate the decisions regarding data or databases without considering the input from CMOs, CTOs, CIOs or CDOs who should be, in turn, concerned about all the other criteria listed in this article.

Playing with the data costs money. And, at times, a lot of money. When you add up all the costs for hardware, software, platforms, tool sets, maintenance and, most importantly, the man-hours for database development and maintenance, the sum becomes very large very fast, even in the age of the open-source environment and cloud computing. That is why many companies outsource the database work to share the financial burden of having to create infrastructures. But even in that case, the quality of the database should be evaluated based on all criteria, not just the price tag. In other words, don’t just pick the lowest bidder and hope to God that it will be alright.

When you purchase external data, you can also apply these evaluation criteria. A test-match job with a data vendor will reveal lots of details that are listed here; and metrics, such as match rate and variable fill-rate, along with complete the data dictionary should be carefully examined. In short, what good is lower unit price per 1,000 records, if the match rate is horrendous and even matched data are filled with missing or sub-par inferred values? Also consider that, once you commit to an external vendor and start building models and analytical framework around their its, it becomes very difficult to switch vendors later on.

When shopping for external data, consider the following when it comes to pricing options:

  • Number of variables to be acquired: Don’t just go for the full option. Pick the ones that you need (involve analysts), unless you get a fantastic deal for an all-inclusive option. Generally, most vendors provide multiple-packaging options.
  • Number of records: Processed vs. Matched. Some vendors charge based on “processed” records, not just matched records. Depending on the match rate, it can make a big difference in total cost.
  • Installment/update frequency: Real-time, weekly, monthly, quarterly, etc. Think carefully about how often you would need to refresh “demographic” data, which doesn’t change as rapidly as transaction data, and how big the incremental universe would be for each update. Obviously, a real-time API feed can be costly.
  • Delivery method: API vs. Batch Delivery, for example. Price, as well as the data menu, change quite a bit based on the delivery options.
  • Availability of a full-licensing option: When the internal database becomes really big, full installment becomes a good option. But you would need internal capability for a match and append process that involves “soft-match,” using similar names and addresses (imagine good-old name and address merge routines). It becomes a bit of commitment as the match and append becomes a part of the internal database update process.

Business First
Evaluating a database is a project in itself, and these nine evaluation criteria will be a good guideline. Depending on the businesses, of course, more conditions could be added to the list. And that is the final point that I did not even include in the list: That the database (or all data, for that matter) should be useful to meet the business goals.

I have been saying that “Big Data Must Get Smaller,” and this whole Big Data movement should be about (1) Cutting down on the noise, and (2) Providing answers to decision-makers. If the data sources in question do not serve the business goals, cut them out of the plan, or cut loose the vendor if they are from external sources. It would be an easy decision if you “know” that the database in question is filled with dirty, sporadic and outdated data that cost lots of money to maintain.

But if that database is needed for your business to grow, clean it, update it, expand it and restructure it to harness better answers from it. Just like the way you’d maintain your cherished automobile to get more mileage out of it. Not all databases are created equal for sure, and some are definitely more equal than others. You just have to open your eyes to see the differences.