Thursday, April 18, 2024

The problem is you?, part 1

 The Atlantic recently published a critical review of the new book by Tom Schaller and Paul Waldman, White Rural Rage: the Threat to American Democracy. The review, by Tyler Austin Harper, concluded by saying that they were not just wrong, but had it backwards--the threat is from the cities and suburbs:  

"Schaller and Waldman are right: There are real threats to American democracy, and we should be worried about political violence. But by erroneously pinning the blame on white rural Americans, they’ve distracted the public from the real danger. The threat we must contend with today is not white rural rage, but white urban and suburban rage.

Instead of reckoning with the ugly fact that a threat to our democracy is emerging from right-wing extremists in suburban and urban areas, the authors of White Rural Rage contorted studies and called unambiguously metro areas 'rural' so that they could tell an all-too-familiar story about scary hillbillies. Perhaps this was easier than confronting the truth: that the call is coming from inside the house. It is not primarily the rural poor, but often successful, white metropolitan men who imperil our republic."

The report that Harper links to says:  "the more rural a county, the lower its rate of sending insurrectionists, a finding which is significant with a p-value <.01%."  A  just-published paper by Robert A. Pape, Kyle D. Larson, Keven G. Ruby in PS: Political Science and Politics gives a more detailed analysis.  The results are from a negative binomial regression in which the  dependent variable is the number of people from a county who were charged with crimes related to the January 6 attack on the Capitol.  The number is estimated to be 2.88 times as large in urban than in rural counties, controlling for other factors.  

Of course, the population of the county is one of the other factors.  But a negative binomial regression predicts the logarithm of the dependent variable and their control is population (in 100,000s).   The estimated coefficient for population is .148, meaning that the natural log of the predicted number of insurrectionists goes up by .148 for every 100,000 increase in county population.  If the natural log of the predicted number goes up by .148, the predicted number goes up by about 15%.*  If you're starting from a population of 1,000, an increase of 100,000 means that population goes up by a factor of of about 100; if you're starting from a population of 1,000,000, it's 10%; if you're starting from a population of 5,000,000, it's only 2%.  So the model controlling for population builds in a relationship between county population and the chance that a person will be an insurrectionist:  declining and then increasing.  The figure shows the nature and size of the relationship using their estimate:


The number 1 on the y-axis represents the rate in a county of average size (about 100,000).  In a county with population of 10,000, the rate is about 8.5; in a county with 500,000, it's about .4, and in one of 5,000,000, it's about 80.  The biggest county in the United States (Los Angeles) has a population of about 10,000,000, but I don't extend the x-axis that far because it would make the figure too hard to read.   Of course, there is no reason to expect that there really is a relationship of this form.

A straightforward alternative would be to model the rate--number of insurrectionists (x) divided by county population (n).  But log(x/n)=log(x)-log(n), so you could express that by a regression with log(x) as the dependent variable and log(n) as one of the predictors.  Then a coefficient of 1.0 on log(n) would mean that the rate was the same across different county populations; a coefficient of less than one would mean it was higher in counties with smaller populations and a coefficient of greater than 1.0 would mean it was higher in counties with larger populations.  

What happens if you use log(population) rather than population as a control variable?

                                                                        Population                    Log

% white population decline                            .111***                    .035
                                                                        (.019)                        (.020)

manufacturing employment decline                .011                            -.006
                                                                        (.0054)                        (.006)

extra Trump %                                                    -.039***                .003
                                                                            (.0081)                    (.0082)

% non-Hispanic white                                           .009***                .014***
                                                                            (.0033)                    (.003)

Metro county                                                        1.095***             .326*
                                                                           (.1335)                    (.135)

Distance to DC                                                    -.304***            -.210***
                                                                            (.0623)                (.051)

(log) population                                                    .148***            .999***
                                                                             (.0210)                (.056)

  The fit of the model with the logarithm as control is better.  Several of the estimates for the other variables change substantially.  The estimate for metro counties is still statistically significant, but not overwhelmingly so (p=.019), and is much smaller than when using population.  So I don't think that the evidence justifies sweeping condemnation of urban and suburban men.

I have experimented with other specifications of the model, but this is enough for one post.  

*My figures are from my analyses using their replication data file, which are slightly different from the numbers implied by their tables.  
  


No comments:

Post a Comment