Multiple correspondence analysis - Understanding Biplots

Information Technology Reference

In-Depth Information

Thus, 1 is an eigenvector corresponding to φ = 1, and it follows from the orthogonality

relationships that 1 Lz = 0 for any other eigenvector z . This shows that 1 Mz = 0sothat

the means are centred, the usual result concerning the 'uninteresting' unit eigenvalue in

CA and MCA.

Finally, let us examine the correlation between the quantifications of the s th dimen-

sion k th variable with the s th dimension of M , or equivalently of GZ . For simplicity, and

without loss of generality, we take s = 1and k = 1. Thus, we are interested in the cor-

relation ρ 1 between G 1 z 1 and g = G 1 z 1 + G 2 z 2 + ... + G p z p where z k ( k = 1, 2, ... , p )

is the first column of Z k . Then, using the eigendecomposition, we have

g G 1 z 1 = φ 1 z 1 L 1 z 1 ,

g g =

k = 1 z k L k z k

= p φ 1 ,

so that

( g G 1 z 1 )

(φ 1 z 1 L 1 z 1 )

φ 1 ) = φ 1 z 1 L 1 z 1

) =

z 1 G 1 G 1 z 1 )(

z 1 L 1 z 1 )(

(

g g

(

ρ 2 ,

...

ρ p of G 2 z 2 ,

...

Similar results follow for the correlations

, G p z p with g .Sum-

ming gives

= φ 1

1 ( z k L k z k ) = φ 1 .

Thus, the sum of squares of the correlations of the first columns of G 1 z 1 , ... , G p z p is max-

imized and equal to the maximum eigenvalue. Subsequent columns have sums of squares

equal to the successively decreasing eigenvalues φ 2 , ... , φ r . We started by generalizing

the two-variable CCA concept and have finished by showing that the generalization has

a nice correlational interpretation in its own right.

1 ρ

8.8 Categorical (nonlinear) principal component analysis

Homogeneity analysis and MCA are essentially the same thing, justified by slightly

different, but equivalent, criteria. A rather different approach is given by nonlinear PCA,

increasingly and more appropriately referred to as categorical PCA. Like homogeneity

analysis, this gives scores to the category levels but now we focus on the individual

categorical variables, to give a data matrix

[ G 1 z 1 , G 2 z 2 ,

...

, G p z p ]

In homogeneity analysis we are interested only in the total and row totals of Gz and

their sums of squares. Here we are interested in the individual values of the table and ask

what values of z give an optimal PCA in some specified number, r , of dimensions. By

'optimal' here we mean the choice of z that maximizes the sum of the first r eigenvalues.

We assume that z is scaled so that the centred columns of H are normalized, that is,

z k G k ( I −

n 11 ) G k z k

k = 1, ... , p . With this normalization, H H is a correla-

= 1,

tion matrix.

Thus, the computational problem for categorical PCA is

min H − Y

(8.16)

Understanding Biplots

Search WWH ::

Custom Search

Home