Information Technology Reference
In-Depth Information
8 Multiple correspondence
analysis
8.1 Introduction
Biplots for the correspondence analysis of a two-way contingency table were discussed
in Chapter 7. Here we shall examine biplots for more than two categorical variables.
There are two areas of confusion. The first is analogous to the difficulty we have already
met (Chapter 3) of whether, in principal component analysis of a data matrix X ,we
are concerned with the analysis of X itself or with X X ; the second is the relationship
with the correspondence analysis of a two-way contingency table. All the techniques we
shall discuss share much common mathematics, which may be regarded as a valuable
unifying feature, although Gower (2006) referred to them as 'divided by a common
language' because important statistical differences tend to become blurred.
The analogy with PCA is very close because, in place of an n × p data matrix X
we have an n × p matrix whose columns give the category levels taken by p categorical
variables for each of the n samples. Thus a categorical variable Hair Colour may have
category levels Dark, Grey, Fair and Brown . Table 8.1 is an example of a small data
matrix of categorical variables.
Such nominal data is usually coded in pseudo-numerical form where the k th variable
is recorded in an n × L k matrix G k and L k is the number of category levels ( L k = 4in
the case of hair colour). The i th row of G k is zero, apart from a single unit in the column
pertaining to the actual category level taken by the i th sample. The column sums of G k
give the frequencies of each category level in the n samples and will be denoted by L k ,
which will be considered as a diagonal matrix. Thus G k 1 = 1 and 1 G k = 1 L k ;also
1 L k 1 = n as every sample must take some category level; G k is termed an indicator
matrix. The indicator matrix G for the complete data is obtained by combining the
Search WWH ::




Custom Search