## Friday, August 7, 2015

### Fox News was right

I wasn't going to post again this soon, but I couldn't resist the opportunity to use this title.  Justin Wolfers objects to Fox's statement that "Given the over 2,400 interviews contained within the five polls, from a purely statistical perspective it is at least 90% likely that the tenth place Kasich is ahead the eleventh place Perry." The shares in the polls were 3.2% for Kasich and 1.8% for Perry.  It seems to be easier to work with numbers than with small percentages, and given  2,400 total respondents, that's 77 and 43 people.  The observed difference is 34, and the standard deviation is about 11.

Wolfer's basic point is to calculate a probability that one candidate is ahead of another, you need to start with some assumptions about the way that things might be.  He illustrates this by what he acknowledges is a contrived example--"Either Republican voters like Mr. Kasich so much that he is beating Mr. Perry by 20 points, or alternatively they find them both pretty likable, but Perry’s new glasses are sufficient to give him a 0.1 percentage point lead. A small poll in which Mr. Kasich edges out Mr. Perry by a mere 1.4 percentage points is more consistent with the latter scenario than the former," so if we start by assuming the two scenarios are equally likely, we will end by concluding that Perry is probably (in fact, almost certainly) ahead.

However, Fox didn't start by saying that they would use polling results to decide between Perry and Kasich--they started by saying that they would use polling results to pick the ten participants.  So we're talking about comparing two unspecified candidates who are 10th and 11th in the polls., which may be what they mean by "from a purely statistical perspective."   Since the candidates are now just "one guy" and "another guy", it seems the prior distribution representing  possible values of the difference in support for them has to be symmetrical.

Once you restrict your attention to symmetrical prior distributions, it's hard not to conclude that someone who leads in the poll by 34 has at least a 90% probability of being ahead.  For example, suppose the prior distribution is normal with mean zero and standard deviation of 100. This means that you think there could be a pretty big gap between them.  By definition, the #10 and #11 candidates have less than 10% of the vote each, so a difference of 100 out of 2400 (about 4%) is big.  Then the probability that the candidate who leads by 34 is really ahead is .999.

What about normal with mean zero and standard deviation of 25?  This would amount to saying that you expect the #10 and #11 candidates to be pretty close--it's unlikely that they're as much as 2% (48 out of 2400) apart.  The probability falls to .998.  What if the distribution is normal with mean zero and standard deviation of 10?  This means that you have a strong expectation that the #10 and #11 candidates are very close.  Now the probability falls to 0.98. What if the distribution is normal with mean zero and a standard deviation of 5?  Now it's down to 0.88.  But this distribution amounts to saying that you're almost sure that the #10 and #11 candidates are almost tied. It seems hard to justify that assumption when you just start by knowing that you're talking about the #10 and #11 candidates in a field of about 16.

You could fault Fox for not including a qualifier like "under any reasonable assumptions."  But the conclusion that there was at least a 90% chance that Kasich was really ahead was justified.  In fact, I'd commend them for putting it that way rather than giving an exact number.