Graphics Reference
In-Depth Information
Assuming that the order of the variables in the mosaicplot is X ,...,X p ,wecan
make the above equations a bitmorespecific: ifvariable X i ispresented inahorizon-
tal split of the plot, the width of each tile represents
P
(
h
(
i
)
v
(
i
)) =
P
(
X i
v
(
i
)) ċ
P
(
h
(
i
)
v
(
i
))
,
i.e., the height is given by the conditional probability of X i given previous variables
with splits in the vertical direction, while the overall width of the tile is given by
previous splits in the horizontal direction. If we consider what can actually be seen
inamosaicplot,thewaytospecifytheorderinwhichvariablesshouldbeaddressedin
a mosaic becomes clear. Since we are working in this highly conditioned framework,
stratum variables shouldbeaddressedinamosaicplot first,whilethemostimportant
variables should be addressed last.
Visualizing Interaction Efects
13.2.2
One way to measure the strength of association between two binary variables X and
Y is to compute the odds ratio: denote the cell probabilities of a
tablewith
variables X and Y by π , π , π and π ,suchthatπ ij
=
P
(
X
=
X i
Y
=
Y j
)
,where
X i is the ith category of X and Y j is the j th category of Y.
heoddsratio (cross-productratio) θ between two binary variables X and Y pro-
vides a measure of the association between them. It is defined as
π π
π π
θ
=
.
Valuesfortheoddsratiorangebetween and
;valuesofθ closeto meanastrong
negative association of X and Y,andlargepositivevaluesindicateastrongpositive
association. When θ
+
thevariablesX and Y areassumedtobeindependent.his
asymmetryinvaluescausesmanypeopletoworkwiththelogarithmoftheoddsratio
instead, which ensures symmetric behavior, i.e., a value of log θ
=
=
q is just as strong
an association as log θ
indicating independence.
Figure . shows several examples of mosaicplots of
=−
q,withlogθ
=
contingency tables. he
amount ofinteraction between thevariables increases fromlettoright. Visually, this
is indicated by an increase in the measure d,whered is the difference in conditional
probabilities d
.Itcan beshown (seeHofmann,
)that this difference d is approximately linear in log θ,the logarithm of the odds
ratio for the four cells, or more specifically:
=
m
(
m
+
m
)−
m
(
m
+
m
)
d
.
log θ
ċ
heapproximationholdssolongaseither m
m .
Statements about whether or when d indicates a significant interaction between
the variables cannot be drawn directly from a diagram, since only the odds ratio
is visualized, which is independent of the underlying sample size. Comparisons of
odds ratios, however, can be made directly. Fromthe two mosaicplots on the right of
Fig. . ,wecan see that the oddsratio forthe plot on the right isabout twice the size
m
m
m or m
m
m
Search WWH ::




Custom Search