Information Technology Reference
In-Depth Information
This property of chi-squared distance, the so-called
principle of distributional
equivalence
, is attractive because it suggests that chi-squared distance is not sensitive to
small changes when row and column categorizations are amalgamated.
However, a less attractive property of chi-squared distance is that the division by
x
.
j
gives higher weights to rarely occurring column categories, which is not always
(some would say, rarely) desirable. Thus, we have to balance the principle of distribu-
tional equivalence with whether the weighting incorporated into chi-squared distance is
appropriate in the first place.
The row chi-squared distances defined by (7.13) can also be regarded as weighted
Euclidean distances between all rows of
X
, and it is easy to see that the row chi-squared
distances can be computed as ordinary Euclidean distances between pairs of rows of the
matrix
R
−
1
XC
−
1
/
2
. Since translation does not affect distance, we are free to adjust by
the translation term
11
C
1
/
2
/
n
, showing that the chi-squared distances are also generated
by
R
−
1
XC
−
1
/
2
−
11
C
1
/
2
/
n
. We may write this as
R
−
1
(
X
−
R11
C
/
n
)
C
−
1
/
2
=
R
−
1
(
X
−
E
)
C
−
1
/
2
,
and therefore consider the approximation
X
}
R
1
/
2
{
R
−
1
(
X
−
E
)
C
−
1
/
2
2
−
(7.16)
which is a similar weighted least-squares problem to (7.9) but now
X
approximates the
points that generate the row chi-squared distances; also, the criterion no longer carries
column weights. In contrast to the high weights given to rarely occurring column cat-
egories in the definition of chi-squared distance, in the fitting criterion (7.16), the row
weights
R
1
/
2
give lower weights to rarely occurring row categories. Equation (7.16) may
be written
R
1
/
2
X
V
−
2
U
.
(7.17)
So the approximation
X
is obtained from the inner product
R
−
1
/
2
U
V
and we may
plot the first
r
columns of
R
−
1
/
2
U
for the rows and
V
for the columns. Note that
V
,
being an orthogonal matrix, does not affect the distances given by the row coordinates.
This derivation is very close indeed to PCA with weights
R
1
/
2
, the distances between
pairs of row points now approximating chi-squared distance rather than the Pythagorean
distance between the rows of
X
. It follows from the orthogonality of the vector
1
R
1
/
2
to the remaining
p
−
1 singular vectors, that the weighted mean of the plotted points is
1
R
1
/
2
(
R
−
1
/
2
U
)
=
1
U
=
0
, which is the usual centring for PCA. Furthermore, the
vectors
V
are acting like axes and may be calibrated in the usual way (see Section 3.2).
Corresponding results apply to column chi-squared distance,
x
ij
x
.
j
−
2
p
x
ij
x
.
j
1
x
i
.
d
jj
=
,
(7.18)
i
=
1
generated by the columns of
R
−
1
/
2
XC
−
1
, leading to plotting the first
r
columns of
C
−
1
/
2
V
for the columns and
U
for the rows.
Very commonly the two chi-squared distance plots discussed above are amalga-
mated by plotting the columns of
R
−
1
/
2
U
and
C
−
1
/
2
V
simultaneously as two sets of