Information Technology Reference
In-Depth Information
approximate the observed values d ii presented in a symmetric proximity matrix D .
Crucially, there is no requirement that the entries of D be derived from a data matrix X ;
they may be directly observed in some experimental situation. Then, a monoplot of Z is
the only possibility. We denote the distances generated by the rows of Z by .
Different methods of multidimensional scaling depend on the criterion specified for
measuring the discrepancy between observed, D , and fitted distances, , and the numeri-
cal algorithms used for optimizing the chosen criterion. These criteria may be divided into
two classes - metric multidimensional scaling and nonmetric multidimensional scaling.
The simplest metric multidimensional scaling method is classical scaling/principal coor-
dinates analysis (PCO), discussed in detail in Chapter 5. PCO requires that there exists
amatrix Y in (at most) n
1 dimensions whose rows generate the observed distances
exactly (the Euclidean embeddable property); in PCA, Y coincides with X . Then, Z of
rank r is obtained by minimizing || Y Z ||
2 which, as we have seen, is found from the
Eckart-Young theorem on the least-squares properties of the SVD of Y . As explained in
Chapter 5, the two steps of finding Y and then Z may be subsumed into a single simple
eigenvalue decomposition. In this method, plays no overt part.
When D is directly observed, the distances are unlikely to be generated by a real
matrix Y . Then, strictly speaking, PCO is unavailable, though it may give acceptable
results when there are only a few unimportant negative eigenvalues in the eigendecom-
position. Other forms of metric scaling do not require the existence of Y but do depend
overtly on . The two most popular are least-squares scaling which minimizes a criterion
termed stress ,definedby
n
2
i ( d ii δ ii )
;
i
<
and least-squares squared scaling which minimizes a criterion sstress ,definedby
n
i ( d ii δ
2
ii )
2
.
i
<
These criteria have to be minimized by available iterative methods, such as ALSCAL,
KYST,orSMACOF,
(see Cox and Cox, 2001; Borg and Groenen, 2005).
Figure 10.1 gives a geometric interpretation of the stress criterion. We plot the points
( d ii ,
...
for all ( ii )
δ ii )
pairs. With a perfect fit, these points would lie on the line d
= δ
.
1
2
In other cases, the projection of ( d ii , δ ii ) onto the line d = δ is
( d ii + δ ii , d ii + δ ii ) ,
giving a residual sum of squares equal to half the stress criterion. Note that stress =
1
2
2 . A similar representation where all distances are replaced by squared dis-
tances gives the sstress criterion. The representation of Figure 10.1 is not particularly
interesting in itself, but it contributes to a better understanding of the corresponding
nonmetric criterion, to be described next.
Nonmetric multidimensional scaling depends only on the ordinal information in D .
Then, it is required only that the fitted values
|| D ||
approximate the same ordering as those
in D . How this is done is indicated in Figure 10.2, which is similar to Figure 10.1 except
that the linear 'regression' is replaced by a monotone regression function, as shown.
This monotone regression function is obtained by partitioning the points into blocks that
are above and to the right of all preceding blocks. The fitted values ˆ
δ ii
are determined
Search WWH ::




Custom Search