There are many posers in the data and analytics industry. Unfortunately, some of them occupy managerial positions, making critical decisions based on superficial knowledge and limited experiences. I’ve seen companies wasting loads of money and resources on projects with no substantial value — all because posers in high places bought into buzzwords or false promises. As if buzzwords have some magical power to get things done “auto-magically.”
I’ve written articles about how to identify posers and why buzzwords suck. But allow me to add a few more thoughts, as the phrase “Machine Learning” is rapidly gaining that magical power in many circles. You’d think that machines could read our minds and deliver results on their own. Sorry to break it to you, but even in the world of Star Trek, computers still wouldn’t understand illogical requests.
Beware of people who try to employ machine learning and no other technique. Generally, such people don’t even understand what they are trying to automate, only caring about the cost reduction part. But the price that others end up paying for such a bad decision could be far greater than any savings. The worst-case scenario is automating inadequate practices, which leads to wrong places really fast. How can anyone create a shortcut if he doesn’t know how to get to the destination in the first place, or worse, where the destination is supposed to be?
The goal of any data project should never be employing machine learning for the sake of it. After all, you wouldn’t respect a guitarist who can’t play a simple lick, just because he has a $5,000 custom guitar on his shoulder.
Then, what is the right way to approach this machine learning hype? First, you must recognize that there are multiple steps in predictive modeling. Allow me to illustrate some major steps and questions to ask:
- Planning: This critical step is often the most difficult one. What are you trying to achieve through data and analytics? Building the most eloquent model can’t be the sole purpose outside academia. Converting business goals into tangible solution sets is a project in itself. What kind of analytics should be employed? What would be the outcome? How will those model scores be applied to actual marketing campaigns? How will the results would be measured? Prescribing proper solutions to business challenges within the limitation of systems, toolsets, and the budget is one of the most coveted skill sets. And it has nothing to do with tools like machine learning, yet.
- Data Audit: Before we chart a long analytics journey, let’s put a horse before the cart, as data is the fuel for an engine called machine learning. I’ve seen too many cases where the cart is firmly mounted before the horse. What data are we going to use? From what sources? Do we have enough data to perform the task? How far in time do the datasets go back? Are they merged in one place? Are they in usable forms? Too many datasets are disconnected, unstructured, uncategorized, and unclean. Even for the machines.
- Data Transformation: Preparing available data for advanced analytics is also a project in itself. Be mindful that you don’t have to clean everything; just deal with the elements that are essential for required analytics to meet pre-determined business goals. At this stage, you may employ machine learning to categorize, group, or reformat data variables. But note that such modules are quite different from the ones for predictions.
- Target Definition: Setting up proper model targets is half-art/half-science. If the target is hung on a wrong spot, the resultant model will never render any value. For instance, if you are targeting so-called “High Value” customers, how would you express it in mathematical terms? It could be defined by any combinations of value, frequency, recency, and product categories. The targets are to be set after a long series of assumptions, profiling, and testing. No matter what modeling methodology eventually gets employed, you do NOT want targets to be unilaterally determined by a machine. Even with a simple navigator, which provides driving directions through machine-based algorithms, the user must provide the destination first. A machine cannot determine where you need to go (at least not yet).
- Universe Definition: In what universe will the resultant model be applied and used? Model comparison universe is as important as the target itself, as a model score is a mathematical expression of differences between two dichotomous universes (e.g., buyers vs. non-buyers). Even with the same target, switching the comparison universe would render completely different algorithms. On top of that, you may want to put extra filters by region, gender, customer type, user segment, etc. A machine may determine distinct sets of universes that require separate models, but don’t relinquish all controls to machines, either. Machine may not aware of where you would apply the model.
- Modeling: This statistical work is comprised of sub-steps such as variable selection, variable transformation, binning, outlier exclusion, algorithm creation, and validation, all in multiple iterations. It is indeed laborious work, and “some” parts may be done by the machines to save time. You may have heard of terms such as Deep Learning, Neural Net, logistic regression, stepwise regression, Random Forest, CHAID analysis, tree analysis, etc. Some are to be done by machines, and some by human analysts. All those techniques are basically to create algorithms. In any case, some human touch is inevitable regardless of employed methodology, as nothing should be released without continuous testing, validation, and tweaking. Don’t blindly subscribe to terms like “unsupervised learning.”
- Application: An algorithm may have been created in a test environment, but to be useful, the model score must be applied to the entire universe. Some toolsets provide “in-database-scoring”, which is great for automation. Let me remind you that most errors happen before or after the modeling step. Again, humans should not be out of the loop until everything becomes a routine, all the way to campaign execution and attribution.
- Maintenance: Models deteriorate and require scheduled reviews. Even self-perpetuating algorithms should be examined periodically, as business environments, data quality, and assumptions may take drastic turns. The auto-pilot switch shouldn’t stay on forever.
So, out of this outline for a simple target modeling (for 1:1 marketing applications), which parts do you think can fully be automated without any human intervention? I’d say some parts of data transformation, maybe all of modeling, and some application steps could go on the hands-free route.
The most critical step of all, of course, is the planning and goal-setting part. Humans must breathe their intention into any project. Once things are running smoothly, then sure, we can carve out the parts that can be automated in a step-wise fashion (i.e., never in one shot).
Now, would you still believe sales pitches that claim all your marketing dreams will come true if you just purchase some commercial machine-learning modules? Even if decent toolsets are tuned up properly, don’t forget that you are supposed to be the one who puts them in motion, just like self-driving cars.