Graphics Reference
In-Depth Information
he plot produces a good representation of the map of the UK. he vertical axis
represents West-East, while the horizontal axis runs South-North. It would appear
that Newcastle upon Tyne has relocated to Scotland!
Had the exact distances between the rail stations been used in the above (and as-
suming the UK is in a -D Euclidean space!), coordinates would have been found
for the stations that would have exactly reproduced the pairwise distances between
them. All eigenvalues of
would have been zero except for the first two. In general,
ratherthan usingdistancesorpseudo-distancesbetween points,classical scalinguses
dissimilaritiescalculatedbetweenpairsofobjectsinplaceofthesedistances.hecon-
figuration of points obtained in a -D space will not usually reproduce the pairwise
dissimilarities exactly, but will only approximate them. his implies that nearly all of
the eigenvalues of
B
are likely tobe nonzero, and some might be negative, whichwill
occur if the dissimilarity measure is not a metric. In practice the largest two (pos-
itive) eigenvalues and their associated eigenvectors are used for the coordinates of
the points. If a -D representation is required, then the three largest eigenvalues are
used,and so on. Ameasure of howwell the obtained configuration represents the set
of pairwise dissimilarities is given by
B
p
i = λ i
p
i = λ i
or
.
( . )
n
i =
(
positive eigenvalues
)
λ i
Incidentally, if the dissimilarities are calculated as Euclidean distances, then classical
scaling can be shown to be equivalent to principal component analysis.
henextexample consists of viruseswithrod-shapedparticlesaffecting various
crops (tobacco, tomato, cucumber and others) recently employed by Ripley ( )
andoriginallydescribedbyFauquetetal.( )andanalysedbyEslava-Gomez( ).
here are measurements on each virus, the number of amino acid residues per
moleculeofcoatprotein.hewholedatasetconsistsoffourgroupsofviruses,Horde-
viruses ( ), Tobraviruses ( ), Tobamoviruses ( ) and Furoviruses ( ). For brevity
the initial four letters of their names will denote the four virus groups. Figure .
shows a classical scaling of the data.
While Tobr and Hord form clear clusters, Furo splits into three clear groups, one
of which is similar to Tobr. Similarly Toba forms two groups, one of which is similar
to Tobr. he first two eigenvalues have values and . he sum of all sig-
nificant eigenvalues is out of a potential of values. he first two dimensions
correspond to % and hence provide a reasonable description of the data.
Another metric scaling approach is to minimize a loss function. For a Sammon
map (Sammon ), a particular configuration of points with pairwise distances,
d rs
, representing the dissimilarities
δ rs
,haslossfunction
δ
rs
S
=
r < s
(
d rs
δ rs
)
r < s
δ rs .
( . )
A configuration is found that has minimum loss using an appropriate optimization
method such as a steepest descent method. Other loss functions have also been sug-
gested and used. Figure . shows a Sammon map for the virus data.
Search WWH ::




Custom Search