Thursday, August 15, 2013

Good times, bad times

A few days ago, the New York Times had a piece by Seth Stephens-Davidowitz, who has a PhD in economics and is now an intern at Google.  He looked at searches involving "depression" that Google's algorithms classified as "health related," and found some clear patterns.  They were more common in winter months and in colder places.  They also varied within the week, being most common on Monday and least common on the weekends.

The Behavioral Risk Factors Surveillance System, a very large telephone survey, asks "in general, how satisfied are you with your life."  The great majority of people say that they are either "very satisfied" or "satisfied" but it seems reasonable to interpret "dissatisfied" or "very dissatisfied" as representing something similar to what the average Google user would mean by "depression."  I used the 2009 data, because I happened to be working with it for another purpose.

The differences by month in the percent saying they are dissatisfied or very dissatisfied are not statistically significant at conventional levels (P=.082) even though the BRFSS sample about 400,000 cases.  But for what it's worth, it's highest in August (6.26%) and lowest in December (5.66%).  There are significant differences by day:  dissatisfaction is lowest on Monday (5.7%) and highest on Friday (6.4%), which is pretty much the opposite of the Google pattern.

In short, the patterns in the BRFSS are totally different from those found in the Google searches.  I haven't taught research methods in several years, but textbooks always used to warn students that a very big sample was not necessarily a representative sample.  The favorite example was the 1936 Literary Digest survey, with millions of respondents, which predicted that Alfred Landon would win a solid victory over FDR.  The Google searches are essentially Literary Digest type data:  they represent a lot of "volunteers," not anything designed to produce a representative sample.  I wouldn't dismiss the Google results--if searches for  "depression" are more common in winter or on Mondays, that means something.  But the straightforward interpretation that Stephens-Davidowitz offers--people feel worse in winter and on Mondays--isn't borne out by the data from a random sample.

