Over the next several months, the news media will report on poll after poll that shows either presidential candidate Donald Trump gaining on opponent Hillary Clinton or Hillary surging against Trump. There will be polls on what’s happening in different swing states and among different demographic groups. How accurate they are depends on the methodology used, how the sample was derived and the margin of error associated with the sample size – not to mention how today’s events in the 24/7 news cycle can throw the results of yesterday’s poll into turmoil.
Many years ago, I remember playing with a low-tech exhibit at the Franklin Institute in Philadelphia that was the best illustration of how sample size can affect the outcome of a poll.
My memory may not be entirely accurate on this, but the concept is simple. There was a box that contained 100 marbles: 45 red marbles and 55 blue marbles. You would tilt the box so that all the marbles ran to the top. Then, you would tilt the box the other way. As the marbles rolled to the bottom, 10 were captured in little cups — while the rest fell to the bottom. Sometimes, the cups captured more red marbles than blue marbles. Other times, the blue marbles far exceeded the number of red marbles. Do it enough times, and the blue marbles will eventually win.
Nate Cohn draws a comparison between polls and the national pastime in the New York Times:
It’s a lot like baseball. Even great baseball players go 0 for 4 in a game — or have rough stretches for weeks on end. On the other end might be a few multi-hit nights with extra-base hits, or a spectacular few weeks.
Sometimes, these rough stretches or hot streaks really do indicate changes in the underlying ability of a player. More often, they are just part of the noise inevitable with small samples. Taking more polls is like watching more at-bats, and you need many if you want to be confident about whether a candidate is ahead or tied.
That’s why baseball is a statistician’s favorite sport; it has a large sample size. Thirty teams each play 162 games in the regular season for a total of 2,430 contests. As the wins and losses converge toward the mean, the best teams win about 60 percent of their games and the worst teams win about 40 percent.
So be wary of placing your faith and trust in the poll du jour. It’s a long season.