In the data business, the ability to fine-tune database structure and toolsets to meet unique business requirements is key to success, not just flashy features and functionalities. Beware of technology providers who insist on a “one-size-fits-all” customer data solution, unless the price of entry is extremely low. Always check the tech provider’s exception management skills and their determination to finish the last mile. Too often, many just freeze at the thought of any customization.
The goal of any data project is to create monetary value out of available data. Whether it is about increasing revenue or reducing cost, data activities through various types of basic and advanced analytics must yield tangible results. Marketers are not doing all this data-related work to entertain geeks and nerds (no offense); no one is paying for data infrastructure, analytics toolsets, and most importantly, human cost to support some intellectual curiosity of a bunch of specialists.
Therefore, when it comes to evaluating any data play, the criteria that CEOs and CFOs bring to the table matter the most. Yes, I shared a long list of CDP evaluation criteria from the users’ and technical points of views last month, but let me emphasize that, like any business activity, data work is ultimately about the bottom line.
That means we have to maintain balance between the cost of doing business and usability of data assets. Unfortunately, these two important factors are inversely related. In other words, to make customer data more useful, one must put more time and money into it. Most datasets are unstructured, unrefined, uncategorized, and plain dirty. And the messiness level is not uniform.
Start With the Basics
Now, there are many commoditized toolsets out in the market to clean the data and weave them together to create a coveted Customer-360 view. In fact, if a service provider or a toolset isn’t even equipped to do the basic part, I suggest working with someone who can.
For example, a service provider must know the definition of dirty data. They may have to ask the client to gauge the tolerance level (for messy data), but basic parameters must be in place already.
What is a good email address, for instance? It should have all the proper components like @ signs and .com, .net, .org, etc. at the end. Permission flags must be attached properly. Primary and secondary email must be set by predetermined rules. They must be tagged properly if delivery fails, even once. The list goes on. I can think of similar sets of rules when it comes to name, address, company name, phone number, and other basic data fields.
Why are these important? Because it is not possible to create that Customer-360 view without properly cleaned and standardized Personally Identifiable Information (PII). And anyone who is in this game must be masters of that. The ability to clean basic information and matching seemingly unmatchable entities are just prerequisites in this game.
Even Basic Data Hygiene and Matching Routines Must Be Tweaked
Even with basic match routines, users must be able to dictate tightness and looseness of matching logics. If the goal of customer communication involves legal notifications (as for banking and investment industries), one should not merge any two entities just because they look similar. If the goal is mainly to maximize campaign effectiveness, one may merge similar looking entities using various “fuzzy” matching techniques, employing Soundex, nickname tables, and abbreviated or hashed match keys. If the database is filled with business entities for B2B marketing, then so-called commoditized merge rules become more complicated.
The first sign of trouble often becomes visible at this basic stage. Be aware of providers that insist on “one-size-fits-all” rules, in the name of some universal matching routine. There was no such thing even in the age of direct marketing (i.e., really old days). How are we going to go through complex omnichannel marketing environment with just a few hard-set rules that can’t be modified?
Simple matching logic only with name, address, and email becomes much more complex when you add new online and offline channels, as they all come with different types of match keys. Just in the offline world, the quality of customer names collected in physical stores vastly differs from that of self-entered information from a website along with shipping addresses. For example, I have seen countless invalid names like “Mickey Mouse,” “Asian Tourist,” or “No Name Provided.” Conversely, no one who wants to receive the merchandise at their address would create an entry “First Name: Asian” and “Last Name: Tourist.”
Sure, I’m providing simple examples to illustrate the fallacy of “one-size-fits-all” rules. But by definition, a CDP is an amalgamation of vastly different data sources, online and offline. Exceptions are the rules.
Dissecting Transaction Elements
Up to this point, we are still in the realm of “basic” stuff, which is mostly commoditized in the technology market. Now, let’s get into more challenging parts.
Once data weaving is done through PII fields and various proxies of individuals across networks and platforms, then behavioral, demographic, geo-location, and movement data must be consolidated around each individual. Now, demographic data from commercial data compilers are already standardized (one would hope), regardless of their data sources. Every other customer data type varies depending on your business.
The simplest form of transaction records would be from retail businesses, where you would sell widgets for set prices through certain channels. And what is a transaction record in that sense? “Who” bought “what,” “when,” for “how much,” through “what channel.” Even from such a simplified view point, things are not so uniform.
Let’s start with an easy one, such as common date/time stamp. Is it in form of UTC time code? That would be simple. Do we need to know the day-part of the transaction? Eventually, but by what standard? Do we need to convert them into local time of the transaction? Yes, because we need to tell evening buyers and daytime buyers apart, and we can’t use Coordinated Universal Time for that (unless you only operate in the U.K.).
“How much” isn’t so bad. It is made of net price, tax, shipping, discount, coupon redemption, and finally, total paid amount (for completed transactions). Sounds easy? Let’s just say that out of thousands of transaction files that I’ve encountered in my lifetime, I couldn’t find any “one rule” that governs how merchants would handle returns, refunds, or coupon redemptions.
Some create multiple entries for each action, with or without common transaction ID (crazy, right?). Many customer data sources contain mathematical errors all over. Inevitable file cutoff dates would create orphan records where only return transactions are found without any linkage to the original transaction record. Yes, we are not building an accounting system out of a marketing database, but no one should count canceled and returned transactions as a valid transaction for any analytics. “One-size-fits-all?” I laugh at that notion.
“Channel” may not be so bad. But at what level? What if the client has over 1,000 retail store locations all over the world? Should there be a subcategory under “Retail” as a channel? What about multiple websites with different brand names? How would we organize all that? If this type of basic – but essential – data isn’t organized properly, you won’t even be able to share store level reports with the marketing and sales teams, who wouldn’t care for a minute about “why” such basic reports are so hard to obtain.
The “what” part can be really complicated. Or, very simple if product SKUs are well-organized with proper product descriptions, and more importantly, predetermined product categories. A good sign would be the presence of a multi-level product category table, where you see entries like an apparel category broken down into Men, Women, Children, etc., and Women’s Apparel is broken down further into Formalwear, Sportswear, Casualwear, Underwear, Lingerie, Beachwear, Fashion, Accessories, etc.
For merchants with vast arrays of products, three to five levels of subcategories may be necessary even for simple BI reports, or further, advanced modeling and segmentation. But I’ve seen too many cases of incongruous and inconsistent categories (totally useless), recycled category names (really?), and weird categories such as “Summer Sales” or “Gift” (which are clearly for promotional events, not products).
All these items must be fixed and categorized properly, if they are not adequate for analytics. Otherwise, the gatekeepers of information are just dumping the hard work on poor end-users and analysts. Good luck creating any usable reports or models out of uncategorized product information. You might as well leave it as an unknown field, as product reports will have as many rows as the number of SKUs in the system. It will be a challenge finding any insights out of that kind of messy report.
Behavioral Data Are Complex and Unique to Your Business
Now, all this was about relatively simple “transaction” part. Shall we get into the online behavior data? Oh, it gets much dirtier, as any “tag” data are only as good as the person or department that tagged the web pages in question. Let’s just say I’ve seen all kinds of variations of one channel (or “Source”) called “Facebook.” Not from one place either, as they show up in “Medium” or “Device” fields. Who is going to clean up the mess?
I don’t mean to scare you, but these are just common examples in the retail industry. If you are in any subscription, continuity, travel, hospitality, or credit business, things get much more complicated.
For example, there isn’t any one “transaction date” in the travel industry. There would be Reservation Date, Booking Confirmation Date, Payment Date, Travel Date, Travel Duration, Cancellation Date, Modification Date, etc., and all these dates matter if you want to figure out what the traveler is about. If you get all these down properly and calculate distances from one another, you may be able to tell if the individual is traveling for business or for leisure. But only if all these data are in usable forms.
Always Consider Exception Management Skills
Some of you may be in businesses where turn-key solutions may be sufficient. And there are plenty of companies that provide automated, but simpler and cheaper options. The proper way to evaluate your situation would be to start with specific objectives and prioritize them. What are the functionalities you can’t live without, and what is the main goal of the data project? (Hopefully not hoarding the customer data.)
Once you set the organizational goals, try not to deviate from them so casually in the name of cost savings and automation. Your bosses and colleagues (i.e., mostly the “bottom line” folks) may not care much about the limitations of toolsets and technologies (i.e., geeky concerns).
Omnichannel marketing that requires a CDP is already complicated. So, beware of sales pitches like “All your dreams will come true with our CDP solution!” Ask some hard questions, and see if they balk at the word “customization.” Your success may depend on their ability to handle exceptions than executing some commoditized functions that they had acquired a long time ago. Unless you really believe that you will safely get to your destination on a “autopilot” mode.