Multiple correspondence analysis - Understanding Biplots

Information Technology Reference

In-Depth Information

gives approximations to the contingency tables scaled, as in CA itself, by the square roots

of the inverse row and column frequencies. For this reason, this type of MCA may be

deemed more acceptable than the version based on p − 1 / 2 GL − 1 / 2 . Note that the diagonal

blocks of the normalized Burt matrix are units, the off-diagonal blocks being the scaled

two-way contingency tables (and their transposes) of CA.

The least-squares approximation to the symmetric normalized Burt matrix is given by

2 JV , which would indicate a plot of V

J . This is a monoplot (Chapter 10) in which

the pairwise inner products of the L plotted points give approximations to the normalized

Burt matrix. Alternatively, we may regard G G as a giant two-way contingency table

and use any of the approximations discussed in Chapter 7. Focusing on approximating

chi-squared distance, we note that, corresponding to the argument leading to (7.17),

L − 1 G GL − 1 / 2

(8.10)

generates all the row chi-squared distances arising from the Burt matrix. This bland

statement merits closer attention. Firstly, we note that (8.10) includes contributions from

the unit diagonal matrices (see below). Secondly, each contingency table enters twice,

first as L − 1

j G j G i L − 1 / i , the second of which

generates row chi-squared distances that are column chi-squared distances of the first.

Thus (8.10) involves not only all the row chi-squared distances but also all the column

chi-squared distances of the two-way contingency tables. Apart from the diagonal block,

calculating chi-squared distances between different levels of the same variable generated

by (8.10) gives the sum of the p

G i G j L − 1 / 2

and then in transposed form L − 1

1 different estimates (one from each of the contingency

tables involving that variable as a row classifier) and is an acceptable estimate. However,

evaluating chi-squared distances between levels of different variables generated by (8.10)

is as hard to justify as is calculating chi-squared distances between rows and columns

of a two-way contingency table. With this background, as for chi-squared distance CA

(Section 7.2.4), we plot

−

L − 1 / 2 V

2 J

(8.11)

Equation (8.11) gives p sets of CLPs but no representation of the n units, as is in accord

with the common practice of CA. We may plot the units, as above, at the centroids of

the their category points:

Z 0 = GZ / p .

(8.12)

Although the chi-squared distances based on the Burt matrix are functions of those of

the correspondence analysis of a contingency table, they differ from the row chi-squared

distances discussed above (8.10) used in the analysis of G . They differ yet again from the

column chi-squared distances for G , which, if they were used, would measure distances

between as well as within the levels of the p categorical variables (see Section 8.2).

An MCA chi-squared distance biplot based upon the Burt matrix associated with the

data of Table 8.1 is given in Figure 8.7 as a result of setting mca.variant = "Burt"

in the call to MCAbipl .

When p = 2, our notations for CA and MCA are related by R = L 1 and C = L 2 .

Then, the normalized Burt matrix is

R − 1 / 2 G 1 G 2 C − 1 / 2

(8.13)

C − 1 / 2 G 2 G 1 R − 1 / 2

Understanding Biplots

Search WWH ::

Custom Search

Home