Information Technology Reference
In-Depth Information
The column labelled 'model' in Table 7.1 relates to the function being minimized
by the weighted least-squares criterion, and the next column gives the row and/or col-
umn weights used. As can be seen, these weights appear in inverted form in the inner
product approximation X to the model, given in the fourth column of the table. In
the final column, we give typical matrices usually used to construct biplots for the
rows and columns. We use the word 'typical' because the inner products may be split
into two components in an infinite number of ways. For example, we have seen that
when approximating the contingency ratio, and one wishes to use the centroid prop-
erty discussed at the end of Section 7.2.2, then we have to plot R 1 / 2 U , C 1 / 2 V (or
R 1 / 2 U , C 1 / 2 V ) rather than the typical, symmetric values given in Table 7.1. Writ-
ing the SVD of R 1 / 2
( X E ) C 1 / 2
= ( U α ) ( V β ) , the commonest choices of scales
α = β =
/
α = β =
α =
β =
α =
β =
α = β
are
1
2,
1,
1and
0, or
0and
1. When
,the
plots are said to be symmetric, otherwise asymmetric. When
1, the inner prod-
uct is preserved and calibrated axes may be used (see Section 2.3). The inner product
is not preserved when α = β = 1, as happens (i) when simultaneously plotting points
that approximate both row and column chi-squared distances and (ii) for the correla-
tional approach, so care has to be taken not to use inner product interpretations for such
diagrams. Having said this, Gabriel (2002) and Gower (2004) showed that whichever
scaling is used has little effect on the biplot visualization, at least in the sense that they
are all very highly correlated. This implies that they are all displaying similar informa-
tion, whichever of the criteria discussed above is being used. One thing they all have in
common is a concern with different aspects of departure from an assumption of indepen-
dence. It seems to us that when the rows and columns of a contingency table are of equal
status (see Section 7.1) there is little justification for using asymmetric biplots. However,
sometimes, as with chi-squared distance, rows and columns do not have equal status and
then it is not unreasonable to treat them differently. Of more importance is to use the
appropriate weightings of R and C that are consistent with the choice of criterion.
Quite independently of the choice of
α + β =
α
β
or, indeed, of any way of partitioning
the inner product, we have an additional decision as how to represent the biplot. There
are three possibilities:
and
(i) use points to represent both the row and column elements;
(ii) use lines (calibrated axes) to represent both the row and column elements;
(iii) use points for one of the classifications and axes for the other.
Possibility (i) is by far the most common usage for CA. Moreover, it is essential when
the centroid property (Section 7.2.3) is central to interpretation. However, in this topic
we have emphasized the advantages of using calibrated axes as in (iii). This may appear
to be introducing an asymmetric element into CA as we have to decide whether the
rows of the columns are represented by axes. Fortunately, the choice is entirely arbitrary
since it makes no difference to the predicted values. This is not so for the chi-squared
distance variant of CA, where the row and column distances are essentially asymmetric.
Although (ii) is a symmetric option, it is rarely, if ever, used. Both points and axes may
be combined so providing calibrations on the axes but also highlighting the positions of
row and/or column elements.
A fourth option, not discussed further in this topic, rests on the observation that
ab cos(
θ) =
ab sin(
θ + π
/2). The left-hand side represents a simple inner product and the
Search WWH ::




Custom Search