Information Technology Reference
In-Depth Information
Thus, 1 is an eigenvector corresponding to φ = 1, and it follows from the orthogonality
relationships that 1 Lz = 0 for any other eigenvector z . This shows that 1 Mz = 0sothat
the means are centred, the usual result concerning the 'uninteresting' unit eigenvalue in
CA and MCA.
Finally, let us examine the correlation between the quantifications of the s th dimen-
sion k th variable with the s th dimension of M , or equivalently of GZ . For simplicity, and
without loss of generality, we take s = 1and k = 1. Thus, we are interested in the cor-
relation ρ 1 between G 1 z 1 and g = G 1 z 1 + G 2 z 2 + ... + G p z p where z k ( k = 1, 2, ... , p )
is the first column of Z k . Then, using the eigendecomposition, we have
g G 1 z 1 = φ 1 z 1 L 1 z 1 ,
g g =
k = 1 z k L k z k
p
= p φ 1 ,
so that
( g G 1 z 1 )
2
1 z 1 L 1 z 1 )
φ 1 ) = φ 1 z 1 L 1 z 1
2
2
1
ρ
=
) =
.
z 1 G 1 G 1 z 1 )(
z 1 L 1 z 1 )(
(
g g
(
p
p
ρ 2 ,
...
ρ p of G 2 z 2 ,
...
Similar results follow for the correlations
,
, G p z p with g .Sum-
ming gives
p
p
= φ 1
p
1 ( z k L k z k ) = φ 1 .
Thus, the sum of squares of the correlations of the first columns of G 1 z 1 , ... , G p z p is max-
imized and equal to the maximum eigenvalue. Subsequent columns have sums of squares
equal to the successively decreasing eigenvalues φ 2 , ... , φ r . We started by generalizing
the two-variable CCA concept and have finished by showing that the generalization has
a nice correlational interpretation in its own right.
2
k
1 ρ
k
=
k
=
8.8 Categorical (nonlinear) principal component analysis
Homogeneity analysis and MCA are essentially the same thing, justified by slightly
different, but equivalent, criteria. A rather different approach is given by nonlinear PCA,
increasingly and more appropriately referred to as categorical PCA. Like homogeneity
analysis, this gives scores to the category levels but now we focus on the individual
categorical variables, to give a data matrix
H
=
[ G 1 z 1 , G 2 z 2 ,
...
, G p z p ]
.
In homogeneity analysis we are interested only in the total and row totals of Gz and
their sums of squares. Here we are interested in the individual values of the table and ask
what values of z give an optimal PCA in some specified number, r , of dimensions. By
'optimal' here we mean the choice of z that maximizes the sum of the first r eigenvalues.
We assume that z is scaled so that the centred columns of H are normalized, that is,
z k G k ( I
1
n 11 ) G k z k
k = 1, ... , p . With this normalization, H H is a correla-
= 1,
tion matrix.
Thus, the computational problem for categorical PCA is
2
min H Y
.
(8.16)
Search WWH ::




Custom Search