Information Technology Reference
In-Depth Information
|
Pr(R
T)). Call this first example 'the medical example'. In the second example,
call it 'the agricultural example', we are asked to consider the same data, but
now T and
T are replaced by the varieties of plants (white [W] or black variety
[
W]), R and
R by the yield (high [Y] or low yield [
Y]) and M and
Mby
the height of plants (tall [T] or short [
T]).
Tabl e 7. Simpson's Paradox (Agricultural Example)
Yield
Rates
Overall
Yield
Rates
T
T
Two
Groups
Y
Y
T
Y
Y
T
W
18
12
2
8
60%
20%
50%
W
7
3
9
21
70%
30%
40%
Given this new interpretation, the overall yield rate suggests that planting
the white variety is preferable since it is 10% better overall, although the white
variety is 10% worse among both tall and short plants (sub-population statistics).
Which statistics should one follow in choosing between which varieties to plant in
the future? The standard recommendation is to take the combined statistics and
thus recommend the white variety for planting (since Pr(Y
W)),
which is in stark contrast with the recommendation given in the medical example.
In short, both medical and agricultural examples provide varying responses to the
“what-to-do” question. There is no unique response regarding which statistics,
subpopulation or whole, to follow in every case of SP. We agree with standard
recommendations with a proviso, i.e., we need to use substantial background
information, which is largely causal in nature, to answer “what-to-do” questions,
as doing something means causing something to happen.
|
W) > Pr(Y
|∼
6 Truths about SP: An Evaluation of Causal Accounts
We argued that to understand the significance of SP as a whole, we need to
distinguish three types of questions (first-level truth) as well as divorce the first
two questions from the third to show that causality is irrelevant both in unlocking
the paradoxical nature of SP and providing conditions for its emergence (second-
level truth). Based on our discussion of the causal accounts, one realizes that
causal theorists have in fact addressed the “what-to-do” question. We don't
deny that causal inference plays a crucial role in choosing the right statistic when
confronted with the paradox. Hence we agree with both Pearl and SGS about the
third question. However, as far as we know, SGS have not distinguished the three
questions about SP, and thereby failed to appreciate the first-level truth about
SP. Pearl on the other hand, does distinguish the three questions. But both
causal accounts fail to understand the second-level truth about the paradox.
Notice that one may, like Pearl, recognize the first-level truth and yet fail to
recognize the second-level truth. An examination of his responses to the first
two questions will reveal the reason behind this, showing how his causal account
Search WWH ::




Custom Search