Election Polls and the Price of Being Wrong 

The thing about predictive analytics is that the quality of a prediction is eventually exposed — clearly cut as right or wrong. There are casually incorrect outcomes, like a weather report failing to accurately declare at what time the rain will start, and then there are total shockers, like the outcome of the 2016 presidential election.

screen-shot-2016-11-17-at-1-03-34-pmThe thing about predictive analytics is that the quality of a prediction is eventually exposed — clearly cut as right or wrong. There are casually incorrect outcomes, like a weather report failing to accurately declare the time it will start raining, and then there are total shockers, like the outcome of the 2016 presidential election.

In my opinion, the biggest losers in this election cycle are pollsters, analysts, statisticians and, most of all, so-called pundits.

I am saying this from a concerned analyst’s point of view. We are talking about colossal and utter failure of prediction on every level here. Except for one or two publications, practically every source missed the mark by more than a mile — not just a couple points off here and there. Even the ones who achieved “guru” status by predicting the 2012 election outcome perfectly called for the wrong winner this time, boldly posting a confidence level of more than 70 percent just a few days before the election.

What Went Wrong? 

The losing party, pollsters and analysts must be in the middle of some deep soul-searching now. In all fairness, let’s keep in mind that no prediction can overcome serious sampling errors and data collection problems. Especially when we deal with sparsely populated areas, where the winner was decisively determined in the end, we must be really careful with the raw numbers of respondents, as errors easily get magnified by incomplete data.

Some of us saw that type of over- or under-projection when the Census Bureau cut the sampling size for budgetary reasons during the last survey cycle. For example, in a sparsely populated area, a few migrants from Asia may affect simple projections like “percent Asians” rather drastically. In large cities, conversely, the size of such errors are generally within more manageable ranges, thanks to large sample sizes.

Then there are human inconsistency elements that many pundits are talking about. Basically everyone got so sick of all of these survey calls about the election, many started to ignore them completely. I think pollsters must learn that at times, less is more. I don’t even live in a swing state, and I started to hang up on unknown callers long before Election Day. Can you imagine what the folks in swing states must have gone through?

Many are also claiming that respondents were not honest about how they were going to vote. But if that were the case, there are other techniques that surveyors and analysts could have used to project the answer based on “indirect” questions. Instead of simply asking “Whom are you voting for?”, how about asking what their major concerns were? Combined with modeling techniques, a few innocuous probing questions regarding specific issues — such as environment, gun control, immigration, foreign policy, entitlement programs, etc. — could have led us to much more accurate predictions, reducing the shock factor.

In the middle of all this, I’ve read that artificial intelligence without any human intervention predicted the election outcome correctly, by using abundant data coming out of social media. That means machines are already outperforming human analysts. It helps that machines have no opinions or feelings about the outcome one way or another.

Dystopian Future?

Maybe machine learning will start replacing human analysts and other decision-making professions sooner than expected. That means a disenfranchised population will grow even further, dipping into highly educated demographics. The future, regardless of politics, doesn’t look all that bright for the human collective, if that trend continues.

In the predictive business, there is a price to pay for being wrong. Maybe that is why in some countries, there are complete bans on posting poll numbers and result projections days — sometimes weeks — before the election. Sometimes observation and prediction change behaviors of human subjects, as anthropologists have been documenting for years.

Winner of the 2012 Presidential Election: Data

Now that the contentious 2012 election has finally ended, we get a chance to look back and assess what happened and why. Regardless of who you voted for, it’s impossible not to acknowledge that the real winner of the 2012 election was data.

Now that the contentious 2012 election has finally ended, we get a chance to look back and assess what happened and why. Regardless of who you voted for, it’s impossible not to acknowledge that the real winner of the 2012 election was data.

For the first time in history, this election demonstrated the power of using analytics and numbers crunching for politics. What I find most remarkable is the rapid evolution of this change. If you look back just a few years ago, Karl Rove was widely regarded as the political mastermind of the universe. Rove’s primary innovation was the use of highly targeted direct mail campaigns to get out the evangelical and rural vote to win the 2004 election for George W. Bush. Fast-forward a few short years, and not only did Rove’s candidate lose, but the master strategist was reduced to challenging his network’s numbers geeks live on the air, only to be rebuffed.

In every way, the old guard was bested by a new generation of numbers crunchers, nerds and data geeks who leveraged data science, analytics, predictive modeling and a highly sophisticated online marketing campaign to poll, raise money and get out the vote in an unprecedented manner.

On the subject of polling, I was intrigued by Nate Silver’s incredibly accurate FiveThirtyEight blog that used a sophisticated system to synthesize dozens of national polls in a rolling average to predict the actual election results. In the run-up to the election, he even received a lot of flak from various pundits who claimed he was wrong basing on their perception on voter “enthusiasm,” “momentum” and other non-scientific observations. At the end of the day, however, data won out over hot air and punditry big time. Silver’s final tally was absolutely dead on, crushing most other national polls by a wide margin.

I especially love his Nov. 10 post in which Silver analyzes the various polls and shows which ones fared the best and which ones weren’t worth the paper they were printed on. It’s shocking to see that the Gallup Poll—in many people’s mind the oldest and most trusted name in polling—was skewed Republican by a whopping 7.2 points when averaged across all 11 of their polls. Ouch. For an organization that specializes in polling, their long-term viability must be called into question at this point.

One thing I find highly interesting when looking at the various poll results is that when you examine their methodologies, it’s not too surprising that Gallup fell flat on its face, relying on live phone surveys as the primary polling method. When considering that many young, urban and minority voters don’t have a landline and only have a cellphone, it doesn’t take a rocket scientist to conclude any poll that doesn’t include a large number of cellphones in its cohort is going to skew wildly Republican … which is exactly what happened to Gallup, Rasmussen and several other prominent national polls.

Turning to the Obama campaign’s incredible Get Out The Vote (GOTV) machine that turned out more people in more places than anyone could have ever predicted, there’s no doubt in anyone’s mind that for data-driven marketers, the 2012 U.S. election victory was a watershed moment in history.

According to a recent article in Time titled “Inside the Secret World of the Data Crunchers Who Helped Obama Win,” the secret sauce behind Obama’s big win was a massive data effort that helped him raise $1 billion, remade the process of targeting TV ads, and created detailed models of swing-state voters that could be used to increase the effectiveness of everything from phone calls and door-knocks to direct mailings and social media.

What’s especially interesting is that, similarly to a tech company, Obama’s campaign actually had a large in-house team of geeks, data scientists and online marketers. Composed of elite and senior tech talent from Twitter, Google, Facebook, Craigslist and Quora, the program enabled the campaign to turn out more volunteers and donors than it had in 2008, mostly by making it it simpler and easier for anyone to engage with the President’s reelection effort. If you’d like to read more about it, there’s a great article recently published in The Atlantic titled “When the Nerds Go Marching In” that describes the initiative in great detail.

Well, looks like I’m out of space. One thing’s for sure though, I’m going to be very interested to see what happens in coming elections as these practices become more mainstream and the underlying techniques are further refined.

If you have any observations about the use of data and analytics in the election you’d like to share, please let me know in your comments.

—Rio