Monday, February 24, 2014

pants on fire

Consider this gallop poll from October, 2012, just a few weeks before the election.  What conclusions would you draw from looking at the graphic?  Does it seem like Romney has a good lead?  Does it seem Romney is pulling ahead?  There's a lot of unanswerable questions in this graphic that are really important if you want to understand what the data is telling you.

First what is the sample size?  Is it 20 people or 1000?   
How were people polled? Registered voters, likely voters, random folks?
How sure are you of your results?  What if I said the pollsters were 80% confident?  What if they were 99% confident?

All of these questions are important and because you don't know the answers, the pollsters can draw whatever conclusion they want from the data.

Here's what I mean.  What if the accompanying text told you  that 1000 random registered voters were surveyed.  I hope that would make you more confident than if 20 people in Chicago were selected.  It is Gallop, so let's assume those 1000 random, registered voters are guaranteed. 

What if you knew that the margin of error was plus or minus 4%?  What conclusion would you draw about the October 15 poll?  I hope you can see that if the error could be 4%, then the October 15 poll was pretty much a statistical dead heat.  What if I told you the margin of error was plus or minus 2%?  Then you might conclude that Romney was ahead on October 15. 

The sneaky part is that Gallop and anyone using this poll can actually make both of those claims by manipulating how confident they are with the result.  If they are 99% sure of the result, then the margin of error is a whopping 4%.  If they are only 80% confident, then Romney takes the lead.  And the average reader will never know the difference.

So why am I talking about these ideas?  Well, I am teaching my statistics students this information right now, but more importantly, in these kind of political scenarios, where races are fairly close, it is possible to make the numbers say anything you want them to simply by manipulating the sample size and the level of confidence you prefer.  This means that a particular media source can literally call a winner in any close race simply by bending the statistics.  And this explains why two polls can have completely different results. 

I'd like to think that the media is acting with integrity when they report on polls, but to be fair, recent events lead me to believe otherwise.  If you really want to use mathematics to call political races, I would lean toward Nate Silver's blog, FiveThirtyEight.  The man is a statistical genius (yes I have a mad crush on him), and he correctly predicted nearly every race in the 2012 election.  For the sake of harmony in my own life, I'm going to head there during the next election and consider his efforts the most credible source for predicting results.  Then I can skip all the hot arguments on Facebook. 

No comments:

Post a Comment