Tuesday, October 17, 2017

Raise the bar?

A paper published in Nature Human Behavior proposes changing "the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005"--in terms of t-ratios, from about 2 to 2.8.  The paper seems to have been written with experimental social psychology in mind, but its 72 listed authors include economists, political scientists, and sociologists.  They are a distinguished group--the sociologists are from the University of Pennsylvania, Univ of North Carolina, Michigan, Harvard, Princeton, and Stanford. 

The core argument of the case is about the chance of "false positives."  The great majority of the hypotheses proposed in the social sciences are of the form "x is associated with y" (controlling for other factors relevant to y).  If the observed data would be unlikely under the "null hypothesis" that "x is not associated with y" (controlling for other factors), you count it as support for the hypothesis that "x is associated."

Suppose that for every ten proposed hypotheses that are true, there are 100 that are false.  Using a .05 level means that we can expect a statistically significant association for five of the false ones.  Suppose a statistically significant association is found for 80% of the true hypotheses, which is the target that people usually aim for in designing experiments; then 5 out of 13, or almost 40% of the statistically significant associations will represent false hypotheses.  Their idea is that researchers should change the standard of statistical significance to 0.5% and continue to aim for 80% power (which would mean bigger experiments).  That would mean there would continue to be 8 statistically significant associations that represent real ones but only 0.5 (6% of the total) that are spurious.

The ratio of true to false proposed hypotheses is crucial here.  If it's 1:1, then with 80% power and a 5% significance level, we have only 6% spurious associations.  The authors offer some evidence that the ratio is about 1:10 for psychology experiments, and say that a "similar number has been suggested in cancer clinical trials, and the number is likely to be much lower in biomedical research."   They also address the possible objection that the "threshold for statistical significance should be different for different research committees."  They say that they agree, and that genetics and high-energy physics have gone for a higher standard--a t-ratio of about 5, but don't even address the possibility that a lower standard might be appropriate.  That is, they seem to take a 10:1 ratio of false to true hypotheses as the minimum, and recommend the .005 standard as a baseline suitable to all fields.  They return to this point in the concluding remarks, where they say that since the .05 level was established "a much larger pool of scientists are now asking a much larger number of questions, possibly with much lower prior odds of success."  This isn't convincing to me.  In the papers I read (published or for review), most of the hypotheses about relations between variables seem pretty plausible.  Even if I don't find the reasoning that leads to the prediction convincing, and often I don't, it's not hard to think of an alternative argument (or several arguments) that leads to the same prediction.  The idea that more scientists asking more questions means lower prior odds of success isn't  compelling either.  In some fields, theory has developed, and that should let you make reasonable predictions on more questions.  In others, there's at least more evidence,  meaning more examples to draw on in making predictions.  So I doubt there is a tendency for the prior odds in favor of proposed hypotheses to decline. 

If they were just making a suggestion about how to interpret the .05 significance level, I would not object, and in fact would generally agree (see my book Hypothesis Testing and Model Selection in the Social Sciences).  But realistically, a "default" of .005 would mean it would become difficult to publish work in which the key parameter estimates were not statistically significant at that level, just as it's now difficult to publish work in which the key parameter estimates aren't significant at the .05 level.*  That would be a loss, not a gain, especially with non-experimental data, where a bigger sample is usually not an option.


*They say results that didn't reach .005 "can still be important and merit publication in leading journals if they address important research questions with rigorous methods,"  but I'm confident that the great majority of reviewers and editors would say that about the .05 level today.  Importance and rigor are matters of judgment, so there's usually disagreement among reviewers; the "default" level of significance is objective, so it takes on outsize importance.

Wednesday, October 11, 2017

Then and now

Ta-Nehisi Coates has a piece called "Civil-Rights Protests Have Never Been Popular," in which he notes that in 1966, 63% of people polled had a negative opinion of Martin Luther King.  The question was asked five times by the Gallup Poll, asking people to rate him on a scale of +5 to -5.  A summary of the results, plus some historical events:

                        +        -        -5
May 1963      41%   37%  (20%)
                                                       March on Washington 8/1963
Aug  1964     44%   38%  (22%)
                                                       Selma march, 3/1965
May  1965     45%  46%   (27%)
                                                        Chicago open housing movement, mid- 1966
Aug   1966    33%  63%   (39%)
Aug   2011    95%    4%    (1%)

During King's life, there was always a significant number giving him the lowest possible rating, which I show in parentheses.

 There are some complications, which I will discuss in the future, but Coates is right in his general point--King was not particularly popular when he was alive, and among whites negative views probably always outnumbered positive views.  Strongly negative views were definitely more common than strongly positive views.

[Data from the Roper Center for Public Opinion Research]

Friday, October 6, 2017

A hypothesis

For some reason that I don't recall, I looked at Edward Banfield's The Unheavenly City Revisited (1974) the other day and ran across this passage, about what he thought was an increasing influence of the middle and upper classes in political life:

"The upper-class ideal . . . requires that issues be settled on their merits, not by logrolling, and that their merits be conceived of in terms of general moral principles that may not, under any circumstances, be compromised.  In the smoke-filled room, it was party loyalty and private interests that mainly moved men; these motives always permitted 'doing business.'  In the talk-filled room, righteous indignation is the main motive, and therefore the longer the talk continues, the clearer it becomes to each side that the other must either be shouted down or knocked down."

Except for the "knocked down," this seems like a good description of the direction of change in American politics since the time he wrote. On the other hand, there is an argument, backed by a good deal of evidence, that increasing levels of education promote stable democracy:  education increases openness to new ideas and ability to see the other person's point of view (see this article for references and more discussion).  So it doesn't seem that Banfield's hypothesis could work as a general rule, but maybe it applies under some circumstances.  One obvious possibility is that the effect of education changes directions--up to a point, increases lead to more willingness to compromise, but beyond that point they reduce it.  There's no systematic evidence of this at the individual level, but it fits with some claims about the politics of intellectuals (see the article referenced above).  Another possibility, which I think is more likely, is that there is some kind of interaction between social conditions and the political system.  That is very vague, but it seems worth thinking about.

 

Saturday, September 30, 2017

The secret of his success

Donald Trump's overturn of the DACA program has been unpopular, and reaction to the Cotton-Perdue plan to change immigration law has been lukewarm.  The lack of enthusiasm is not surprising:  surveys show strong support for allowing people who were brought here as children (or even adults who have been here for a while) to stay, and a fairly even division of opinion on whether the number of legal immigrants should be reduced .  Immigration was Trump's signature issue--did it actually help him?  And if so, how?

I think the answer can be found in a survey sponsored by CNBC and conducted in late October 2016.  It asked "If Donald Trump/Hillary Clinton is elected president, do you think the number of illegal immigrants who come to the United States will increase, stay about the same, or decrease?"  The results:

                        Clinton         Trump
Increase            42%              6%
Same                45%             31%
Decrease          10%             61%

In 2009, a CNN/ORC poll asked "Would you like to see the number of illegal immigrants currently in this country increased, decreased, or remain the same?"  Only 3% wanted to see it increased, and 73% wanted a decrease.  So Trump had a big advantage on this issue.  By comparison, here is what people expected on some other things that Trump had talked about.

"If ... do you think your federal income taxes would increase, stay about the same, or decrease? "

                        Clinton         Trump
Increase            43%             29%
Same                42%             42%
Decrease            6%             19%


"If ... do you think that our trade agreements with other countries will become more favorable to US interests, stay about the same, or become less favorable to US interests?"

                          Clinton         Trump
More favorable   19%             32%
Same                   45%             18%
Less favorable    28%             41%

There was also a question on "which candidate for president would you say has the better policies and approaches to ...Increase your wages," and 46% said Clinton, against 32% for Trump.

It seems that most people thought that Trump would vigorously enforce existing immigration law and Clinton would not.  The Republican platform talked a lot about the need to enforce the law--"our highest priority, therefore, must be to secure our borders and all ports of entry and to enforce our immigration laws"--and said nothing about changing them.  Trump frequently talked about how we had "open borders" and "people pouring across the border."  Clinton and the Democrats did little to counter this picture.  The Democratic platform spoke of "our broken immigration system" and talked about the need for "comprehensive immigration reform," but their only comment on enforcement was that it "must be humane and consistent with our values."  This raises a question of why they didn't point to the substantial rise in deportations under the Obama administration.  I will take that up in a future post.

[Data from the Roper Center for Public Opinion Research]


Monday, September 25, 2017

The owl of Minerva, part 3

In May I had a post about factors associated with support for Donald Trump in the presidential election.  This post elaborates about one of those factors, income.  I used American National Election Studies data to do a series of (binary logistic) regressions on income controlling for various factors.  Here are the estimated effects of income, with a positive sign meaning that higher income goes with a greater chance of voting for Trump:

Controls                  estimate      se
1. none                   .007          .005
2. black, white,
Hispanic, other          -.014          .006

3. plus gender           -.016          .006

4. plus education         .000          .006

5. plus married          -.013          .007


So conclusions about the effect of income depend on what you control for.  If you just compare people with higher incomes to people with lower incomes, it seems those with higher incomes were more likely to vote for Trump.  But if you compare people of the same ethnicity, gender, education, and marital status, it seems those with higher incomes were less likely to vote for Trump.  I think that the second comparison is more meaningful, because we know that ethnicity, education, gender, and marital status made a difference in voting.  However, income doesn't make much difference either way, and is not statistically significant in 1, 4, and 5 (which is why I just say "it seems").  The income variable had 28 categories, and an estimate of -.013 means that going from an income of 25-27,000 (category 8) to 100-109,000 (category 23) would change the probability of supporting Trump vs. Clinton from .5 to .452.
    By comparison, here are the estimates for the other control variables:

White            0.73
Black           -2.32
Hispanic        -.90
Female          -.19
Education      -.15
Married           .63

Education had 16 categories, and the impact of going from a high school graduate with no college (9) and a college graduate (13) was 4*.15=0.6, which is bigger than the impact of going from the lowest to highest income categories (28*.013)=.36.

The basic conclusion is that income was not an important factor in the choice between Trump and Clinton; education was.  This is not surprising, given what is known about the relationship between education and political opinions.  What is surprising for me is that marital status was also an important factor--the difference between married and unmarried people was about the same as the difference between college graduates and people with just a high school diploma.  I knew that marital status was a factor in Democratic vs. Republican support in recent elections, but thought that it was on the same order as gender.

PS:  Data from exit polls shows some increase in support for Trump as income increases.  The difference between the ANES and exit poll data is statistically significant.  My guess is that the ANES estimates are more accurate, partly because the response rate is probably higher, and partly because the exit poll sample is not designed to be representative with respect to anything except which candidate people voted for.  The practical reason I use ANES data is that the individual-level data for the exit polls hasn't been released yet.  But it's safe to say that controlling for the factors discussed here would push the exit poll estimates towards zero.

Tuesday, September 19, 2017

They did it their way

Since my last post was long and complicated, I thought I should follow with something short and simple.  In 1987, the Roper Organization asked "Thinking about the way your own life has turned out so far, would you say it has been primarily a matter of luck or fate, or has it been more a matter of factors which are within your control?"  The same question was asked in CBS News polls in 1996 and 2016.  The results

                  Luck   Your Control   Both    DK
1986           22%        66%             9%       3%
1996           18%        72%             6%       4%
2016           27%        60%             9%       4%

The differences in the relative frequencies of luck and own control are statistically significant.  It seems possible that opinions on this are affected by economic conditions--when people experience bad things like unemployment or reduced income, they are likely to say it's luck.  However, as I recall economic conditions in 1986 and 1996 were roughly like they were in 2016--pretty good but not outstanding.

[Data from the Roper Center for Public Opinion Research]

Sunday, September 17, 2017

More old news

About six months ago, I saw several stories saying that "Having just one black teacher can keep black kids in school," to quote NPR's summary.  They all noted the magnitude of the effect:  almost 40% reduction in dropout rates for low-income black boys.  I located the paper on which the stories were based and thought about posting on it, but it was a long paper by the time I got around to reading it, the attention seemed to have passed.  However, last week's NY Times magazine had a list of statistics on education, and one of them was "exposure to at least one black teacher in Grades 3 to 5 reduced the probability of low-income black male students dropping out of school by almost 40%."  So that led me back to the paper.

The thing that originally attracted my attention was not the general idea that having a black teacher would help to keep black children in school, which seemed plausible, but that it could reduce dropouts by 40% for any group.  There is a lot of data on basic educational outcomes like finishing school, and by the standards of social science it's high quality data.  Moreover, there are a lot of people who have studied the issue, so it seems that any simple and straightforward way to dramatically reduce dropout rates would have been discovered long ago.

The paper reports that the estimated effect on dropout rates is -.04 for all black students, -.06 for persistently low-income black students, and -.12 for persistently low income black male students.  Since about half of students are boys, that suggests that the estimated effect on persistently low income black female students would be about zero, and indeed they report an estimate of 0.00 for that group.  So the issue was treating only the big estimate as worthy of interest.  If you believe that there are differences in the effects on boys and girls (and the difference appears to be statistically significant), both of the estimates are equally important; if you don't, you should just report the estimate for boys and girls combined.  The differences between persistently low income students and other students don't appear to be statistically significant (it's hard to tell from the tables), so maybe you should just report the estimate for all students.

There's also a more complex issue which relates to the way that they got the estimate.  The simple approach would be to do a regression with dropping out as the dependent variable, and having a black teacher plus some other variables as independent variables.    But the authors say that those estimates "are likely biased by unobserved student characteristics that jointly predict classroom assignments and long-run outcomes, even after conditioning on the basic socio-demographic controls in X and school FE (Rothstein 2010). For example, students with lower achievement (Clotfelter, Ladd & Vigdor, 2006) and greater exposure to school discipline (Lindsay & Hart, 2017) are more likely to be matched to black teachers, and these factors likely affect long-run outcomes as well."  That is, black teachers tend to be given the kind of students who are at higher risk of dropping out.  The authors had an idea on how to eliminate this potential bias.  They had multiple students from each school, which means that they could include a dummy variable for each school.  That's a reasonable thing to do, since it's generally agreed that some schools are more effective than others.  They also had five different classes of students:  those who started third grade in 2000, 2001, 2002, 2003, and 2004.  Because of new hires, departures, and leaves, the percent of the teaching staff that was black could change from year to year.  Those personnel changes would depend on idiosyncratic individual factors--getting pregnant, reaching retirement age, having a spouse get a job offer in another state--so they would be random from the point of view of the students.  So you can use within-school variation in the racial composition of the teaching staff over time as a substitute ("instrument") for the original variable (having a black teacher or not) and get unbiased estimates.

This approach strikes me as clever but not very convincing.  Teachers' decisions to stay or go will depend partly on how rewarding it is to work in a school.  That could depend on student performance (teachers like it when their students do well) or on things that might affect student performance, like discipline problems, or how well teachers get along with the administration.  Things get more complicated because what matters is differential effects on black and white teachers, but I can think of possibilities here too:  for example, black teachers may be particularly interested in how the black students are doing.  I think I might trust the simple results more than the results from their method--at any rate, I'd like to see them, but they aren't reported in the paper.

This isn't a straightforward mistake, but the sort of difference of judgment that often comes up with research, and the authors could probably say more in defense of their approach.  But I will stick with my original feeling that a 40% reduction in dropout rates for anyone is too big to be believed.