Databases Reference
In-Depth Information
The causal effect is sometimes defined as the ratio of these
two numbers instead of the difference.
But we don't have God's knowledge, so instead we choose another
population to compare this one to, and we see whether they get cancer
or not, while not taking the drug. Say they have a natural cancer rate
of 0.10. Then we would conclude, using them as a proxy, that the in‐
creased cancer rate is the difference between 0.30 and 0.10, so 20%.
This is of course wrong, but the problem is that the two populations
have some underlying differences that we don't account for.
If these were the “same people,” down to the chemical makeup of each
others' molecules, this proxy calculation would work perfectly. But of
course they're not.
So how do we actually select these people? One technique is to use
what is called propensity score matching or modeling. Essentially what
we're doing here is creating a pseudo-random experiment by creating
a synthetic control group by selecting people who were just as likely
to have been in the treatment group but weren't. How do we do this?
See the word in that sentence, “likely”? Time to break out the logistic
regression. So there are two stages to doing propensity score modeling.
The first stage is to use logistic regression to model the probability of
each person's likelihood to have received the treatment ; we then might
pair people up so that one person received the treatment and the other
didn't, but they had been equally likely (or close to equally likely) to
have received it. Then we can proceed as we would if we had a random
experiment on our hands.
For example, if we wanted to measure the effect of smoking on the
probability of lung cancer, we'd have to find people who shared the
same probability of smoking . We'd collect as many covariates of people
as we could (age, whether or not their parents smoked, whether or not
their spouses smoked, weight, diet, exercise, hours a week they work,
blood test results), and we'd use as an outcome whether or not they
smoked. We'd build a logistic regression that predicted the probability
of smoking. We'd then use that model to assign to each person the
probability, which would be called their propensity score, and then
we'd use that to match. Of course we're banking on the fact that we
figured out and were able to observe all the covariates associated with
likelihood of smoking, which we're probably not. And that's the
Search WWH ::




Custom Search