Information Technology Reference
In-Depth Information
Ta b l e 8 . 1
A simple data matrix for information on five categorical variables for seven
individuals.
Case
Sex
Hair Colour
Region
Work
Education
1 George
M
Brown
England
Manual
School
2 Alisdair
M
Dark
Scotland
Clerical
University
3 Jane
F
Brown
Scotland
Professional
University
4 Ivor
M
Grey
Wales
Professional
University
5 Myfanwy
F
Fair
Wales
Clerical
School
6 Harriet
F
Brown
England
Manual
School
7 Jeremy
M
Grey
England
Professional
Postgrad
Ta b l e 8 . 2 Recoding of Table 8.1 as an indicator matrix G .Here G 1 has two levels
(M, F), G 2 has four levels (B, D, F, G), G 3 has three levels (E, S, W), G 4 has three
levels (M, C, P) and G 5 has three levels (S, U, P). The frequencies 1 L 1, 1 L 2, 1 L 3, 1 L 4
and 1 L 5 are given in the final row.
Case
Sex Hair Colour Region Work Education
MFBDFGESWMCPSUP
1G r e 1 01 0 00 100 1 0 010 0
2 Alisdair 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0
3 e 0 11 0 00 010 0 0 101 0
4I r 1 00 0 01 001 0 0 101 0
5M f nwy0 10 0 10 001 0 1 010 0
6H rri t 0 11 0 00 100 1 0 010 0
7 remy 1 00 0 01 100 0 0 100 1
r i s4 33 1 12 322 2 2 333 1
indicator matrices for all categorical variables to give
G =
...
, G p ]: n × L ,
w e L = L 1 + L 2 + ... + L p .
[ G 1 , G 2 , G 3 ,
Table 8.2 shows Table 8.1 coded as an indicator matrix. Thus G , consisting entirely
of 0s and 1s, is the categorical equivalent of the quantitative data matrix X of PCA.
Because every categorical variable has one level for every sample, we have that the rows
of G all sum to p . Further, the column sums give the frequencies of all the category
levels assumed to be held in an L
×
L diagonal matrix L
=
diag
(
diag
(
L 1 )
,diag
(
L 2 )
,
...
,
p 1 , 1 G
1 L and 1 L1
diag
(
L p ))
. Hence, G1
=
=
=
np .
8.2 Multiple correspondence analysis of the indicator matrix
One way of generalizing CA is to treat the categorical data matrix G as if it were a
two-way contingency table. This compares with the CA of chi-squared distance where
we saw in Chapter 7 that the two-way contingency table is sometimes treated as if it were
a data matrix where either the rows or the columns are treated as if they were variables.
Search WWH ::




Custom Search