Data Mining: Where to Dig First?

The main job of a modern data scientist is to answer business questions for decision-makers. To do that, they have to be translators between the business world and the technology world. This in-between position often creates a great amount of confusion for aspiring data scientists.

Data mining
“Big_Data_Prob,” Creative Commons license. | Credit: Flickr by KamiPhuc

In the age of abundant data, obtaining insights out of mounds of data often becomes overwhelming even for seasoned analysts. In the data-mining business, more than half of the struggle is about determining “where to dig first.”

The main job of a modern data scientist is to answer business questions for decision-makers. To do that, they have to be translators between the business world and the technology world. This in-between position often creates a great amount of confusion for aspiring data scientists, as the gaps between business challenges and the elements that makes up the answers are very wide, even with all of the toolsets that are supposedly “easy to use.” That’s because insights do not come out of the toolsets automatically.

Business questions are often very high-level or even obscure. Such as:

  • Let’s try this new feature with the “best” customers
  • How do we improve customer “experience”?
  • We did lots of marketing campaigns; what worked?

When someone mentions “best” customers, statistically trained analysts jump into the mode of “Yeah! Let’s build some models!” If you are holding a hammer, everything may look like nails. But we are not supposed to build models just because we can. Why should we build a model and, if we do, whom are we going after? What does that word “best” mean to you?

Breaking that word down in mathematically representable terms is indeed the first step for the analyst (along with the decision-makers). That’s because “best” can mean lots of different things.

If the users of the information are in the retail business, in a classical sense, it could mean:

  • Frequently Visiting Customers: Expressed in terms of “Number of transactions past 12 months,” “Life-to-date number of transactions,” “Average days between transactions,” “Number of Web visits,” etc.
  • Big Spenders: Expressed in terms of “Average amount per transaction,” “Average amount per customer for past four years,” “Lifetime total amount,” etc.
  • Recent Customers: Expressed in terms of “Days or weeks since last transaction.”

I am sure most young analysts would want requesters to express these terms like I did using actual variable names, but translating these terms into expressions that machines can understand is indeed their job. Also, even when these terms are agreed upon, exactly how high is high enough to be called the “best”? Top 10 percent? Top 100,000 customers? In terms of what, exactly? Cut-out based on some arbitrary dollar amount, like $10,000 per year? Just dollars, or frequency on top of it, too?

The word “best” may mean multiple things to different people at the same time. Some marketers — who may be running some loyalty program — may only care for the frequency factor, with a hint of customer value as a secondary measure.

But if we dig further, she may express the value of a customer in terms of “Number of points per customer,” instead of just dollar spending. Digging even deeper, we may even have to consider ratios between accumulated points vs. points redeemed over a certain period to define what “best” means. Now we are talking about three-dimensional matrix — spending level, points earned, and points redeemed — just to figure out what the best segment is. And we didn’t even begin to talk about the ideal size of such target segment.

Understanding long- and short-term business goals, and having “blends” of these figures is the most important step in data mining. Again, knowing where to dig is the first step.

Let’s take another example. If we introduce the “continuity” element in all of this — like in telecommunication, subscription or the travel businesses — the word “best” takes yet another different turn. Now we have to think about the longevity of the relationship, in addition to transaction and loyalty elements. For example:

  • Tenure: Expressed in terms of “Years since member signup,” “Months since first transaction,” or “Number of active months since signup”
  • Engagements: “Number of contacts for customer service, trouble-shooting, complaints, or package changes/upgrades”
  • Other Activities: Such as cancelation, delinquent payment, move or reactivation

For the airline business, “best” may mean different things for each flight. Data elements to consider could be:

  • Mileage program status
  • Lifetime mileage/YTD mileage
  • Ticket class/code
  • Ticket price paid for the flight/Discount amount
  • Frequency of the flight (Number of flights in the past 12 months, average days between flights/bookings)
  • Peripheral purchases and paid upgrades

Why do I list all of these tedious details? Because analysts must be ready for any type of business challenges and situations that decision-makers may throw at them.

Another example would be that even in the same credit card company, depending on the division — such as acquisition team and CRM team — the word “best” may mean completely different things. Yes, they all care for “good” customers, but the acquisition team may put more weight on responsiveness, while the CRM team may care for profitability above all else.

Speaking of customer care, “customer experience” can be broken down into multiple variables, again to pose different options to decision-makers. What is the customer experience made of, and what do we need to understand about the whole customer journey? In the age where we collect every click, every word and every view, defining such parameters is very important to get to the answers out fast.

In the sea of data, basically we need to extract the following elements of “experience”:

  • The Subject Matter or Product in Question: Why is the customer contacting us? Start with issue classifications and related product and product category designations. If they are in free form, better get them tagged and categorized. Difficulty level of the issue resolution can be assigned, as well.
  • Number of Actions and Reactions: Expressed in terms of number of contacts/inbound calls per customer, number of outbound calls, chats or services visits per customer.
  • Resolution: In no obscure terms, what was the outcome? Resolved or not resolved? Satisfactory or unsatisfactory? If they are embedded in some call log, better employ text analytics, pronto.
  • How Long Did All of This Take? Expressed in terms of “Minutes between initial contact and resolution,” “Average minutes between actions,” “Average duration of engagements,” etc. Basically, the shorter the better for all of this.

Good customer experience, this way, can be measured more objectively. Reporting required for evaluation of different scenarios can be improved immensely when the building blocks (i.e., variables and metrics) are solid.

Now let’s move onto yet another common question of “what worked — or didn’t work — in various marketing efforts.” Consultants often encounter this type of question, and the biggest hurdle often isn’t the analytics process itself, but messy, disparate, and unstructured data. To understand what worked, well, we must define what that means. First off, what was the desired outcome?

  • Opens and Clicks: Traditional digital analytics metrics
  • Conversion: Now we need to dig into transaction data and attribute them to proper campaigns and channels
  • Renewal: If it is for B-to-B or continuity programs
  • Elevation of Brand Image: Tricky and subjective, so we would need to break down this obscure word, as well.

As for what marketers did to invoke responses from customers or prospects, let’s start breaking down that “what” of the “What worked?” question from that angle. Specifically:

  • Channel: A must-have in the omnichannel world.
  • Source: Where the contact name came from?
  • Selection Criteria: How did you choose the name to contact? By what variable? If advanced analytics were employed, with what segment, what model and what model groups?
  • Campaign Type/Name/Purpose: Such as annual product push, back-to-school sale, Christmas offer, spring clearance, etc.
  • Product: What was the main product featured in the campaign?
  • Offer: What was the hook? Dollar or percentage off? Free shipping? Buy-one-get-one-free? No-payment-until? Discount for a limited period?
  • Creative Elements: Such as content version, types of pictures, font type/size, tag lines, other graphic elements.
  • Drop Day/Time: Daypart of the campaign drop, day of the week, seasonal, etc.
  • Wave: If the campaign involved multiple waves.
  • A/B Testing Elements: A/B testing may have been done in a more controlled environment, but it may be prudent to carry any A/B testing elements on a customer level throughout.

These are, of course, just some of the suggestions. Different businesses may call for vastly different sets of parameters. I tell analysts not to insist on any particular element, but to try to obtain as much clean and dirty data as possible. Nonetheless, I am pointing out that breaking the elements down this way, upfront, is a necessary first step toward answering the “what worked” question.

I have been saying “Big data must get smaller” (refer to “Big Data Must Get Smaller”) for some time now. To do that, we must define the question first. Then we can narrow down the types of data elements that are necessary to (1) define the question in a way that a machine can understand, and (2) derive answers in more comprehensive and consistent ways.

True insights, often, are not a simple summary of findings out of fancy graphical charts. In fact, knowing where to dig next is indeed a valuable insight in itself, like in mining valuable minerals and gems. Understanding where to start the data mining process ultimately determines the quality of all subsequent analytics and insights.

So, when faced with an obscene amount of data and ambiguous questions, start breaking things down to smaller and more tangible elements. Even marketers without analytical training will understand data better that way.

Author: Stephen H. Yu

Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at

4 thoughts on “Data Mining: Where to Dig First?”

  1. Stephen, great suggestions. I like to start by looking for the 20% of customers generating 80% of the revenue and then the 4% accounting for 64%. There’s gold in data, you just have to know where to look!

    1. Absolutely! Data mining isn’t so different from mining gold, And like making a gold watch, we have to refine the data like we make something valuable out of raw material. Thank you for your kind words.

  2. One of your ‘best’ and most helpful pieces. Where we lack clear definitions of questions, we tend to get muddy answers.

    Another thing to look for is the ‘profit dynamic’ of different aspects of the data. By ‘profit dynamic’ I mean those points in the transaction sequence which, if optimized, would show the biggest dynamic: a small increase would show a large rise in profit where at some other point, a large increase might show very little additional profit.

Leave a Reply

Your email address will not be published. Required fields are marked *