My last post suggested that the central result of a paper published in the American Economic Review was sensitive to the specification of the model: specifically, that the evidence was weaker (and would just scrape in at "significant at the 10% level") with a negative binomial model rather than the models they fit: a least-squares regression on the log of a ratio and a Poisson regression. The negative binomial fits substantially better than the Poisson; although they can't be compared directly, there are several reasons to prefer the negative binomial over the least-squares regression (I won't go into them here). The AER has a rigorous review process and the acknowledgments thank sixteen people by name, plus "other participants at numerous seminars for many constructive comments"--why didn't someone suggest (or insist) that they try a negative binomial regression?. My ideas:
1. A tendency to put too much faith in a combination of robust standard errors and "large" sample sizes at the expense of trying to find the right model, or something close to the right model.
2. Taking the number of cases at face value. The analysis includes about 35,000 municipalities, but many of them are very small: 80% are under 1,000. On the average, there is about one collaborator per 1,000 people, so small villages (that is, most of them) generally don't provide much information. Moreover, the analysis included a control for a larger geographical unit, department. There were 95 of those, but in about half of them, every (or almost every) municipality had the same assignment in terms of service under Pétain. Those departments provide no information on the central question. So you could regard the data as a (roughly) 50 by two table: about 50 departments where troops from some municipalities served under Pétain and others didn't. You would lose something by analyzing it that way--the ability to adjust for other qualities of the municipalities. But you would also gain something: it would be easier to notice outliers or influential cases, and perhaps some unanticipated geographical patterns.