The stereotypes of the gawky nerd and the dumb jock may result in part from a selection bias known as Berkson’s paradox. To see how, consider the admission criteria of a hypothetical school.
Let’s say the school automatically accepts applicants whose academic achievements are in the top 10 percent, regardless of any other factors. Applicants with lower grades are only accepted if they have notable extracurricular achievements. (To simplify matters, we’ll just consider sports.) The school also offers sports scholarships, and standout athletes may be accepted even with low grades. In fact, if students’ athletic performance is in the top 10 percent of applicants, the school finds a way to admit them, regardless of grades. These simplified criteria result in the chart above, representing all applicants, with green smiley faces for admitted students and grey dots for those who were turned down.
Let’s assume that athletic and academic performance are not actually correlated. If we visited this school, what would we observe in the student body? First of all, the students tend to be above average in either academics, athletics, or both. This is not surprising, but it creates an unexpected effect. By selecting for both sports talent and high grades, we create a group where those two factors are negatively correlated with each other.
When we look at only the top 10 percent of athletes at the school, we see what we would expect: their academic achievement varies, and the average matches the average of all applicants. But when we look at the students who are terrible at sports, they are all high academic achievers! This gives us a skewed picture and could lead people to associate smarts with physical awkwardness.
The same thing happens when we look at grades. The highest academic achievers include all levels of sports talent, and their average matches the average for all applicants. But the worst academic performers in school all happen to be excellent athletes. This selection bias reinforces the stereotype of the dumb jock.
Berkson’s paradox (also known as Berkson’s fallacy or Berkson’s bias) was first observed in a medical context. In 1929, Johns Hopkins biologist Raymond Pearl found that only 6.6 percent of people who died of cancer had tuberculosis lesions, as opposed to 16.3 percent for people who died of other causes. This led to the hypothesis that tuberculosis protected people from cancer, and even prompted an attempt to treat cancer with tuberculin. Joseph Berkson identified the fallacy seven years after Pearl’s death.
Great demonstration – nice one! Just shows how easy it is to draw false inferences from “rock solid” data if the sample has some bias in it. Big Data analysts beware!