MDM: Big Data-Slayer

There’s quite a bit of talk about Big Data these days across the Web … it’s the meme that just won’t quit. The reasons why are pretty obvious. Besides a catchy name, Big Data is a real issue faced by virtually every firm in business today. But what’s frequently lost in the shuffle is the fact that Big Data is the problem, not the solution. Big Data is what marketers are facing—mountains of unstructured data accumulating on servers and in stacks, across various SaaS tools, in spreadsheets and everywhere else you look in the firm and on the cloud.

There’s quite a bit of talk about Big Data these days across the Web … it’s the meme that just won’t quit. The reasons why are pretty obvious. Besides a catchy name, Big Data is a real issue faced by virtually every firm in business today.

But what’s frequently lost in the shuffle is the fact that Big Data is the problem, not the solution. Big Data is what marketers are facing—mountains of unstructured data accumulating on servers and in stacks, across various SaaS tools, in spreadsheets and everywhere else you look in the firm and on the cloud. In fact, the actual definition of Big Data is simply a data set that has grown so large it becomes awkward or impossible to work with, or make sense out of, using standard database management tools and techniques.

The solution to the Big Data problem is to implement a system that collects and sifts through those mountains of unstructured data from different buckets across the organization, combines them together into one coherent framework, and shares this business intelligence with different business units, all of which have varying delivery needs, mandates, technologies and KPIs. Needless to say, it’s not an easy task.

The usual refrain most firms chirp about when it comes to tackling Big Data is a bold plan to hire a team of data scientists—essentially a bunch of database administrators or statisticians who have the technical skills to sift through the Xs and 0s and make sense out of them.

This approach is wrong, however, as it misses the forest for the trees. Sure, having the right technology team is essential to long-term success in the data game. But truth be told, if you’re going to go to battle against the Big Data hydra, you need a much more formidable weapon in your arsenal. Your organization needs a Master Data Management (MDM) strategy in order to succeed.

A concept still unknown to many marketers, MDM comprises a set of tools and processes that manage an organization’s information on a macro scale. Essentially, MDM’s objective is to provide processes for collecting, aggregating, matching, consolidating, quality-assuring and distributing data throughout the organization to ensure consistency and control in the ongoing maintenance and application use of this information. No, I didn’t make up that definition myself. Thanks, Wikipedia.

The reason why the let’s-bring-in-the-developers approach is wrong is that it gets it backwards. Having consulted in this space for quite some time, I can tell you that technology is one of the least important pieces in the puzzle when it comes to implementing a successful MDM strategy.

In fact, listing out priorities when it comes to MDM, I put technology far to the end of the decision-tree, after Vision, Scope, Data Governance, Workflow/Process, and definition of Business Unit Needs. As such, besides the CTO or CIO, IT staff should not be brought in until after many preliminary decisions have been made. To support this view, I suggest you read John Radcliffe’s groundbreaking ‘The Seven Building Blocks of MDM: A Framework for Success‘ published by Gartner in 2007. If you haven’t read it yet and you’re interested in MDM, I suggest taking a look. Look up for an excellent chart from it.

You see, Radcliffe places MDM Technology Infrastructure near the end of the process, following Vision, Strategy, Governance and Processes. The crux of the argument is that technology decisions cannot be made until the overall strategy has been mapped out.

The rationale is that at a high-level, MDM architecture can be structured in different ways depending on the underlying business it is supporting. Ultimately, this is what will drive the technology decisions. Once the important strategic decisions have been made, a firm can then bring in the development staff and pull the trigger on any one of a growing number of technology providers’ solutions.

At the end of 2011, Gartner put out an excellent report on the Magic Quadrant for Master Data Management of Customer Data Solutions. This detailed paper identified solutions by IBM, Oracle (Siebel) and Informatica as the clear-cut industry leaders, with SAP, Tibco, DataFlux and VisionWare receiving honorable mention. Though these solutions vary in capability, cost and other factors, I think it’s safe to say that they all present a safe and robust platform for any company that wishes to implement an MDM solution, as all boast strong technology, brand and financial resources, not to mention thousands of MDM customers already on board.

Interestingly, regarding technology there’s been an ongoing debate about whether MDM should be single-domain or multi-domain—a “domain” being a framework for data consolidation. This is important because MDM requires that records be merged or linked together, usually necessitating some kind of master data format as a reference. The diversity of the data sets that are being combined together, as well as the format (or formats) of data outputs required, both drive this decision-making methodology.

For companies selling products, a product-specific approach is usually called for that features a data framework built around product attributes, while on the other hand service businesses tend to gravitate toward a customer-specific architecture. Following that logic, an MDM for a supply chain database would contain records aligned to supplier attributes.

While it is most certainly true that MDM solutions are architected differently for different types of firms, I find the debate to be a red herring. On that note, a fantastic article by my colleague Steve Jones in the UK dispels the entire single-versus-multi domain debate altogether. I agree wholeheartedly with Jones in that, by definition, an MDM is by an MDM regardless of scope. The breadth of data covered is simply a decision that needs to be made by the governance team when the project is in the planning stages—well before a single dollar has been spent on IT equipment or resources. If anything, this reality serves to strengthen the hypothesis of this piece, which is that vision more technology drives the MDM implementation process.

Now, of course, an organization may discover that it’s simply not feasible (or desirable) to combine together customer, product and supplier information in one centralized place, and in one master format. But it’s important to keep in mind that the stated goal of any MDM solution is to make sense out of and standardize the organization’s data—and that’s it.

Of course there’s much more I can cover on this topic, but I realize this is a lot to chew on, so I I’ll end this post here.

Has your firm implemented, or are you in the process of implementing, an MDM solution? If so, what process did you follow, and what solution did you decide upon? I’d love to hear about it, so please let me know in your comments.

Updating Your Marketing Database

It’s amazing how quickly things go obsolete these days. For those of us in the business of customer data, times and technologies have changed along with the times. Some has to do with the advent of new technologies; some of it has to do with changing expectations. Let’s take a look at how the landscape has changed and what it means for marketers.

It’s amazing how quickly things go obsolete these days. For those of us in the business of customer data, times and technologies have changed along with the times. Some has to do with the advent of new technologies; some of it has to do with changing expectations. Let’s take a look at how the landscape has changed and what it means for marketers.

For marketing departments, maintaining updating customer data has always been a major headache. One way to update data is by relying on sales team members to make the updates themselves as they go about their jobs. For lack of a better term, let’s call this method internal crowd-sourcing, and there are two reasons why it has its limitations.

The first reason is technology. Typically, customer data is stored in a data hub or data warehouse, which is usually a home-grown and oftentimes proprietary database built using one of many popular database architectures. Customer databases tend to be proprietary because each organization sells different products and services, to different types of firms, and consequently collects different data points. Additionally, customer databases are usually grown organically over many years, and as a result tend to contain disparate information, often collected from different sources during different timeframes, of varying degrees of accuracy.

It’s one thing having data stored in a data warehouse somewhere. It’s quite another altogether to give salespeople access to a portal where the edits can be made—that’s been the real challenge. The database essentially needs to be integrated with or housed in some kind of tool, such as an enterprise resource planning (ERP) software or customer relationship management (CRM) software that gives sales teams some capability to update customer records on the fly with front-end read/write/edit capabilities.

Cloud-based CRM technology (such as SalesForce.com) has grown by leaps and bounds in recent years to fill this gap. Unlike purpose-built customer databases, however, out-of-the-box cloud-based CRM tools are developed for a mass market, and without customizations contain only a limited set of standard data fields plus a finite set of “custom fields.” Without heavy customizations, in other words, data stored in a cloud-based CRM solution only contains a subset of a company’s customer data file, and is typically only used by salespeople and customer service reps. Moreover, data in the CRM is usually not connected to that of other business units like marketing or finance divisions who require a more complete data set to do their job.

The second challenge to internal crowd-sourcing has more to do with the very nature of salespeople themselves. Anyone who has worked in marketing knows firsthand that it’s a monumental challenge to get salespeople to update contact records on a regular basis—or do anything else, for that matter, that doesn’t involve generating revenue or commissions.

Not surprisingly, this gives marketers fits. Good luck sending our effective (and hopefully highly personalized) CRM campaigns if customer records are either out of date or flat out wrong. Anyone who has used Salesforce.com has seen that “Stay in Touch” function, which gives salespeople an easy and relatively painless method for scrubbing contact data by sending out an email to contacts in the database inviting them to “update” their contact details. The main problem with this tool is that it necessitates a correct email address in the first place.

Assuming your salespeople are diligently updating data in the CRM, another issue with this approach is it essentially limits your data updates to whatever the sales team happens to know or glean from each customer. It assumes, in other words, that your people are asking the right questions in the first place. If your salesperson does not ask a customer how many employees they have globally or at a particular location, it won’t get entered into the CRM. Nor, for that matter, will data on recent mergers and acquisitions or financial statements—unless your sales team is extremely inquisitive and is speaking with the right people in your customers’ organizations.

The other way to update customer data is to rely on a third-party data provider to do it for you—to cleanse, correct, append and replace the data on a regular basis. This process usually involves taking the entire database, uploading it to an FTP site somewhere. The database is then grabbed by the third party, who then works their magic on the file—comparing it against a central database that is presumably updated quite regularly—and then returning the file so it can be resubmitted and merged back into the database on the data hub or residing in the CRM.

Because this process involves technology, has a lot of moving parts and involves several steps, it’s generally set up as an automated process and allowed to run on a schedule. Moreover, because the process involves overwriting an entire database (even though it is automated) it requires having IT staff around to supervise the process in a best-case scenario, or jump in if something goes wrong and it blows up completely. Not surprisingly, because we’re dealing with large files, multiple stakeholders and room for technology meltdowns, most marketers tend to shy away from running a batch update more than once per month. Some even run them quarterly. Needless to say, given the current pace of change many feel that’s not frequent enough.

It’s interesting to note that not very long ago, sending database updates quarterly via FTP file dump was seen as state-of-the-art. Not any longer, you see, FTP is soooo 2005. What’s replaced FTP is what we call a “transactional” database update system. Unlike an FTP set-up, which requires physically transferring a file from one server and onto another, transactional data updates rely on an Application Programming Interface, or API, to get the data from one system to another.

For those of you unfamiliar with the term, an API is a pre-established set of rules that different software programs can use to communicate with each other. An apt analogy might be the way a User Interface (UI) facilitates interaction between humans and computers. Using an API, data can be updated in real time, either on a record-by-record basis or in bulk. If a Company A wants to update a record in their CRM with fresh data from Company B, for instance, all they need to do is transmit a unique identifier for the record in question over to Company B, who will then return the updated information to Company A using the API.

Perhaps the best part of the transactional update architecture is that it can be set up to connect with the data pretty much anywhere it resides—in a cloud-based CRM solution or on a purpose built data warehouse sitting in your data center. For those using a cloud-based solution, a huge advantage of this architecture is that once a data provider builds hooks into popular CRM solutions, there are usually no additional costs for integration and transactional updates can be initiated in bulk by the CRM administrator, or on a transaction-by-transaction basis by salespeople themselves. It’s quite literally plug and play.

For those with an on-site data hub, integrating with the transactional data provider is usually pretty straightforward as well, because most APIs not only rely on standard Web technology, but also come equipped with easy-to-follow API keys and instructions. Setting the integration, in other words, can usually be implemented by a small team in a short timeframe and for a surprisingly small budget. And once it’s set up, it will pretty much run on its own. Problem solved.