Information Technology Reference
In-Depth Information
gives approximations to the contingency tables scaled, as in CA itself, by the square roots
of the inverse row and column frequencies. For this reason, this type of MCA may be
deemed more acceptable than the version based on p 1 / 2 GL 1 / 2 . Note that the diagonal
blocks of the normalized Burt matrix are units, the off-diagonal blocks being the scaled
two-way contingency tables (and their transposes) of CA.
The least-squares approximation to the symmetric normalized Burt matrix is given by
2 JV , which would indicate a plot of V
V
J . This is a monoplot (Chapter 10) in which
the pairwise inner products of the L plotted points give approximations to the normalized
Burt matrix. Alternatively, we may regard G G as a giant two-way contingency table
and use any of the approximations discussed in Chapter 7. Focusing on approximating
chi-squared distance, we note that, corresponding to the argument leading to (7.17),
L 1 G GL 1 / 2
(8.10)
generates all the row chi-squared distances arising from the Burt matrix. This bland
statement merits closer attention. Firstly, we note that (8.10) includes contributions from
the unit diagonal matrices (see below). Secondly, each contingency table enters twice,
first as L 1
j G j G i L 1 / i , the second of which
generates row chi-squared distances that are column chi-squared distances of the first.
Thus (8.10) involves not only all the row chi-squared distances but also all the column
chi-squared distances of the two-way contingency tables. Apart from the diagonal block,
calculating chi-squared distances between different levels of the same variable generated
by (8.10) gives the sum of the p
G i G j L 1 / 2
and then in transposed form L 1
i
j
1 different estimates (one from each of the contingency
tables involving that variable as a row classifier) and is an acceptable estimate. However,
evaluating chi-squared distances between levels of different variables generated by (8.10)
is as hard to justify as is calculating chi-squared distances between rows and columns
of a two-way contingency table. With this background, as for chi-squared distance CA
(Section 7.2.4), we plot
L 1 / 2 V
2 J
Z
=
.
(8.11)
Equation (8.11) gives p sets of CLPs but no representation of the n units, as is in accord
with the common practice of CA. We may plot the units, as above, at the centroids of
the their category points:
Z 0 = GZ / p .
(8.12)
Although the chi-squared distances based on the Burt matrix are functions of those of
the correspondence analysis of a contingency table, they differ from the row chi-squared
distances discussed above (8.10) used in the analysis of G . They differ yet again from the
column chi-squared distances for G , which, if they were used, would measure distances
between as well as within the levels of the p categorical variables (see Section 8.2).
An MCA chi-squared distance biplot based upon the Burt matrix associated with the
data of Table 8.1 is given in Figure 8.7 as a result of setting mca.variant = "Burt"
in the call to MCAbipl .
When p = 2, our notations for CA and MCA are related by R = L 1 and C = L 2 .
Then, the normalized Burt matrix is
R 1 / 2 G 1 G 2 C 1 / 2
I
.
(8.13)
C 1 / 2 G 2 G 1 R 1 / 2
I
Search WWH ::




Custom Search