Wednesday, September 24, 2008

Political Polling 101


Like many political junkies, I love political polling. Political polling has come a long way since Dewey "defeated" Truman. Back then, in 1948, it was just Gallup and Roper, who blew the election calls badly. One of the factors that knocked the polls askew was that polling was done by telephone, and many of Truman supporters didn't have telephones. His supporters were, therefore, undercounted. The same problems occur today for another telephone reason --- cell phones. Polsters try to overcome a problem like this by creating polling models which weight the raw data they receive from their telephone polling.
One of my favorite web sites is realclearpolitics.com. They have a section which lists the latest polls, and it is updated regularly throughout the day. Today's national polls, taken over relatively the same time period range from McCain leading by two percentage points to Obama leading by nine percentage points. McCain's support ranges from 42 percent to 48 percent, and Obama's ranges from 46 percent to 52 percent. The polling in the individual states also varies. But why is there such variance in polls taken at the same time? The answer lies in the methodology used in weighting the samples.
Anyone who has perused a book on statistics or suffered through a statistics class in college or high school knows that you can take a sample of a larger group and get somewhat of any idea what makes up the whole group. Sampling is how polling works. It is impossible to survey everyone who's going to vote in an election, so polsters take a sample of the whole group. Polling in Presidential contests usually survey between 600 and 1000 voters. The larger the sample, the smaller the margin of error. A sample size of 600 will give you a 4 percent margin of error, whereas a sample size of around 1000 will give you a 3 percent margin of error.
I saw a great visual example of this on a now defunct children's television show called Mathnet about 20 years ago. The people on the show had a giant container into which they place about 10,000 marbles, 9000 of one color and 1000 of another. After mixing up the marbles, they blindly took out a few hundred marbles and counted the different colors. Invariably the sample of marbles would show about a 90 to 10 advantage of the one color over the other. It wasn't always exactly 90 percent to 10 percent, but it was always within a few pecentage points either way. This is the basis for the sample used by polsters.
The major obstacle polsters face is that unlike the randomly dispersed marbles, the electorate is not randomly dispersed. Utah will undoubtedly vote in favor of John McCain. (Bush won there in 2000 and 2004 by margins of 40 and 45 points, respectively.) Just as Obama will win Wasington, D.C. (Republican candidates have not broken the 10% barrier in years.) Much of these disparities have to do with party affiliation. Voters generally stick with their party in Presidential elections (80 percent or more.) Non-affiliated or independents tend to vote within ten percentage points of a 50/50 split.
Unlike the marble count, polsters do not take the raw data and transpose this directly into percentages. For example, if a sample's raw numbers were, out of 1000 surveyed, 485 for Obama, 445 for McCain, and 70 for undecided or other, the polster would not announce the results as 48.5 percent for Obama and 44.5 percent for McCain. The polster would "weigh" the raw data through a complex formula which takes into account many factors. These polling models do in a great degree explain some of the differences in poll results taken in the same time period.
The two great unknowns in this Presidential election are 1) are the polsters properly compensating for the growing percentage of persons who only have cell phones, and 2) what weight have they given to all the newly registered voters. This year, the Democratic Party in Nevada has registered almost 80,000 new voters, shifting the registration balance from a 6,000-voter Republican advantage in 2006 to a 70,000- voter Democratic edge. Bush won Nevada in 2004 by slightly more than 21,000 votes.
Politcal polling is actually quite accurate, especially when polls are averaged. For example, in 2004, based on the rcp final average, the election morning prediction had Bush garnering 292 electoral votes to Kerry's 242. The only state the rcp average got wrong was Wisconsin. Kerry won Wisconsin by 0.4%. The rcp average had Bush ahead 0.9% in its morning of the election average.