Graphics Reference
In-Depth Information
Table . . he hospital data - expected values
Length of stay(in years)
Visit frequency
-
-
+
Σ
Regular
27.24
21.14
13.62
62
Less than monthly
11.86
9.20
5.93
27
Never
18.89
14.66
9.45
43
Σ58.00
45.00
29.00
132
It is easy to compute the expected table under either of these hypotheses. To fix no-
tations, in the following we consider a two-way contingency table with I rows and J
columns, cell frequencies
n ij
for i
=
,...,I and j
=
,...,J, and row and column
sums n i +
=
j n ij and n + j
=
i n ij , respectively. For convenience, the number of
observations is denoted n
n ++ . Given an underlying distribution with theoreti-
calcellprobabilitiesπ ij , the null hypothesis of independence of the two categorical
variables can be formulated as
=
H
π ij
=
π i + π + j .
( . )
Now, the expected cell frequencies in this model are simply n ij
n.he
expected table for our sample data is given in Table . . It could again be visual-
ized using a mosaicplot, this time applied to the table of expected frequencies. If we
cross-tabulate each tile to fill it with a number of squares equal to the correspond-
ing number of observed frequencies, we get a sieve plot (see Fig. . ). his implicitly
compares expected and observed values, since the density of the grid will increase
with the deviation of the observed from the expected values. his allows the detec-
tion of general association patterns (for nominal variables) and of linear association
(for ordinal variables), the latter producing tiles of either very high or very low den-
sity along one of the diagonals. For our data, the density of the rectangles is marked
along the secondary diagonal, indicating a negative association of the two variables.
hisprovides evidence that visit frequency decreases with the length of stay forthese
patients.
=
n i + n + j
Association Plots
12.2.3
In the last section, we described how to compare observed and expected values of
acontingency table usingsieve plots.Wecan dothis morestraightforwardly byusing
a plot that directly visualizes the residuals. he most widely used residuals are the
Pearson residuals
n ij
n ij
r ij
=
( . )
n ij
which are standardized raw residuals. In an association plot (Cohen, ), each cell
is represented by a rectangle that has a (signed) height that is proportional to the
Search WWH ::




Custom Search