Information Technology Reference
In-Depth Information
where
q
p
n =
x ij ,
j
=
1
i
=
1
and, using the weighted identity on the expression in square brackets of (7.23), we have
p
2
q
1
x . j
2
χ
= n
x i . x i . ( x ij / x i . x i j / x i . )
i
j
=
1
i
<
p
q
1
x . j ( x ij / x i . x i j / x i . )
2
.
= n
x i . x i .
(7.24)
i
i
<
j = 1
The expression in the square brackets on the right-hand side of (7.24) is the chi-squared
distance (7.12) between the i th and i th rows of X . Thus we have the simple result that
p
n
2
x i . x i . d ii =
2 1 RDR1 ,
χ
= n
(7.25)
i
i
<
where D ={ d ii }
is the p × p matrix of all the row chi-squared distances (7.12). Similarly,
for the column chi-squared distances, we have
q
n
2
x . j x . j d jj =
2 1 CDC1 ,
χ
= n
j < j
d jj }
where now D
q matrix of all the column chi-squared distances (7.18).
These results link the chi-squared distances to the total Pearson's χ
={
is the q
×
2
for X .
7.2.5 Canonical correlation approximation
Probably the oldest derivation of CA is due to Hirschfeld (1935) who asked what quan-
tification of the categorical levels of the two variables classifying the contingency table,
maximized their correlation. To express this idea algebraically, we define two indicator
matrices, G 1 and G 2 , of sizes n × p and n × q respectively, identifying row and column
membership of the n cases.
In terms of our previous notation, we have
X = G 1 G 2 ,
R = G 1 G 1 ,
C = G 2 G 2 .
Next, we define quantification vectors z 1 : p × 1and z 2 : q × 1, to be determined, which
transform the categorical variables into quantitative variables G 1 z 1 and G 2 z 2 .Thesetwo
variables have squared (uncentred) correlations ρ
2
given by
2
( z 1 G 1 G 1 z 1 )( z 2 G 2 G 2 z 2 ) =
( z 1 G 1 G 2 z 2 )
2
( z 1 Rz 1 )( z 2 Cz 2 ) .
( z 1 Xz 2 )
2
ρ
=
(7.26)
Search WWH ::




Custom Search