Information Technology Reference
In-Depth Information
This property of chi-squared distance, the so-called principle of distributional
equivalence , is attractive because it suggests that chi-squared distance is not sensitive to
small changes when row and column categorizations are amalgamated.
However, a less attractive property of chi-squared distance is that the division by
x . j gives higher weights to rarely occurring column categories, which is not always
(some would say, rarely) desirable. Thus, we have to balance the principle of distribu-
tional equivalence with whether the weighting incorporated into chi-squared distance is
appropriate in the first place.
The row chi-squared distances defined by (7.13) can also be regarded as weighted
Euclidean distances between all rows of X , and it is easy to see that the row chi-squared
distances can be computed as ordinary Euclidean distances between pairs of rows of the
matrix R 1 XC 1 / 2 . Since translation does not affect distance, we are free to adjust by
the translation term 11 C 1 / 2
/
n , showing that the chi-squared distances are also generated
by R 1 XC 1 / 2
11 C 1 / 2
/ n . We may write this as
R 1
( X R11 C / n ) C 1 / 2
= R 1
( X E ) C 1 / 2 ,
and therefore consider the approximation
X }
R 1 / 2
{ R 1
( X E ) C 1 / 2
2
(7.16)
which is a similar weighted least-squares problem to (7.9) but now X approximates the
points that generate the row chi-squared distances; also, the criterion no longer carries
column weights. In contrast to the high weights given to rarely occurring column cat-
egories in the definition of chi-squared distance, in the fitting criterion (7.16), the row
weights R 1 / 2
give lower weights to rarely occurring row categories. Equation (7.16) may
be written
R 1 / 2 X
V
2
U
.
(7.17)
So the approximation X is obtained from the inner product R 1 / 2 U V and we may
plot the first r columns of R 1 / 2 U for the rows and V for the columns. Note that V ,
being an orthogonal matrix, does not affect the distances given by the row coordinates.
This derivation is very close indeed to PCA with weights R 1 / 2 , the distances between
pairs of row points now approximating chi-squared distance rather than the Pythagorean
distance between the rows of X . It follows from the orthogonality of the vector 1 R 1 / 2
to the remaining p
1 singular vectors, that the weighted mean of the plotted points is
1 R 1 / 2
( R 1 / 2 U ) = 1 U = 0 , which is the usual centring for PCA. Furthermore, the
vectors V are acting like axes and may be calibrated in the usual way (see Section 3.2).
Corresponding results apply to column chi-squared distance,
x ij
x . j
2
p
x ij
x . j
1
x i .
d jj =
,
(7.18)
i
=
1
generated by the columns of R 1 / 2 XC 1 , leading to plotting the first r columns of
C 1 / 2 V
for the columns and U for the rows.
Very commonly the two chi-squared distance plots discussed above are amalga-
mated by plotting the columns of R 1 / 2 U
and C 1 / 2 V
simultaneously as two sets of
Search WWH ::




Custom Search