Databases Reference
In-Depth Information
This picture was rigged, so the issue is obvious. But, of course, when
the data is multidimensional, you wouldn't even always draw such a
simple picture.
In this example, we'd say aspirin-taking is a confounder. We'll talk
more about this in a bit, but for now we're saying that the aspirin-
taking or nonaspirin-taking of the people in the study wasn't randomly
distributed among the people, and it made a huge difference in the
apparent effect of the drug.
Note that, if you think of the original line as a predictive model, it's
actually still the best model you can obtain knowing nothing more
about the aspirin-taking habits or genders of the patients involved.
The issue here is really that you're trying to assign causality.
It's a general problem with regression models on observational data.
You have no idea what's going on. As Madigan described it, “it's the
Wild West out there.”
It could be the case that within each group there are males and females,
and if you partition by those , you see that the more drugs they take,
the better again. Because a given person either is male or female, and
either takes aspirin or doesn't, this kind of thing really matters.
This illustrates the fundamental problem in observational studies: a
trend that appears in different groups of data disappears when these
groups are combined, or vice versa. This is sometimes called Simpson's
Paradox .
The Rubin Causal Model
The Rubin causal model is a mathematical framework for under‐
standing what information we know and don't know in observational
studies.
It's meant to investigate the confusion when someone says something
like, “I got lung cancer because I smoked.” Is that true? If so, you'd have
to be able to support the statement, “If I hadn't smoked, I wouldn't
have gotten lung cancer,” but nobody knows that for sure.
Define Z i to be the treatment applied to unit i (0 = control, 1= treat‐
ment), Y i
1 to be the response for unit i if Z i = 1 and Y i
0 to be the
response for unit i if Z i = 0 .
Then the unit level causal effect , the thing we care about, is
Y i
1− Y i
0 , but we only see one of Y i
0 and Y i
1 .
Search WWH ::




Custom Search