We often evaluate the success of medical treatments or social programs by how much of the population
they help.
Like, suppose we're treating a disease that afflicts both people and cats, and among 1
cat and 4 people we treat, the cat and 1 person recover and 3 people die.
And of 4 cats and 1 person we don't treat, three of the cats recover while the person
and 1 cat die.
In the real world, these numbers might be more like 300 and 100, or whatever, but we'll
keep them small so they're easier to keep track of.
So, in our sample, 100% of treated cats survive while only 75% of untreated cats do, and 25%
of treated humans survive while 0% of untreated humans do.
Which makes it seem like the treatment improves chances of recovery.
Except that if we aggregate the data, among all people and cats treated, only 40% survive,
while among all people and cats left on their own, 60% recover.
Which makes it seem like the treatment reduces chances of recovery.
So which is it?
This is an illustration of Simpson's paradox , a statistical paradox where it's possible
to draw two opposite conclusions from the same data depending on how you divide things
up, and statistics alone cannot help us solve it –\hwe have to go outside statistics and
understand the causality involved in the situation at hand.
For example, if we know that humans get the disease more seriously and are therefore more
likely to be prescribed treatment, then it can make sense that fewer individuals that
get treated survive, even if the treatment increases the chances of recovery, since the
individuals that got treated were more likely to die in the first place.
On the other hand, if we know that humans, regardless of how sick they are, are more
likely to get treated than cats because no one wants to pay for kitty healthcare, then
the fact that 4 out of 5 humans died while only 1 in 5 cats died suggests that, indeed,
the treatment may be a bad choice.
So if you're doing a controlled experiment, you need to make sure to not let anything
causally related to the experiment influence how you apply your treatments, and if you
have an uncontrolled experiment, you have to be able to take those outside biases into
account.
As a more tangible example, Wisconsin has repeatedly had higher overall 8th grade standardized
test scores than Texas, so you might think Wisconsin is doing a better job teaching than
Texas.
However, when broken down by race –\hwhich, via entrenched socioeconomic differences is
a major factor in standardized-test scores – Texas students performed better than Wisconsin
students on all fronts: black Texas students scored higher than black Wisconsin students,
and likewise with hispanic and white students.
The difference in the overall ranking is because Wisconsin has proportionally far fewer black
and hispanic students and proportionally more white students than Texas – so the takeaway
should not be that Wisconsin has better education than Texas!
Just that it has (proportionally) more socioeconomically advantaged people.
In some situations there's also a nice graphical way to picture Simpson's paradox: as two separate
trends that each go one way, but the overall trend between the populations goes the other
way.
Like, maybe more money makes people sadder, and more money makes cats sadder, but if cats
are both much happier and richer than people to start with, the overall trend appears,
incorrectly, to be that more money makes you happier.
Of course, you can also misinterpret this graph to show that, overall, more money makes
you a cat, which I think helps illustrate very well the ability to lie or reach incorrect
conclusions by blindly using statistics without context!
Of course, this is not to say that statistics are always going to be paradoxical or confusing
– it's quite possible that everything will just make sense from the get-go, like if people
and cats both get sadder when you give them more money, and cats are both poorer and happier
than people, then the overall trend is no longer paradoxical: more money = more sadness.
But it's important to be aware that paradoxes like Simpson's paradox are possible, and we
often need more context to understand what a statistic actually means.
Given the mathiness of my videos, it may not surprise you to hear that I get a lot of practice
with math & physics problems while working on them, and this video’s sponsor, Brilliant.org,