Biology Reference
In-Depth Information
make clustering more an art than a science. Each choice leads to dif-
ferent error rates and it cannot be determined that one approach is
globally the best under all circumstances. Consequently, we advocate
cluster analysis as a method of data exploration only. Arguments in
favor or against different clustering techniques are unproductive.
6.6 Clustering analysis example
In the following example, we use the same dataset discussed in the
classification section but we pretend that the classes of different indi-
viduals are not known. Thus, we mix all four datasets together and
then apply clustering procedures to 'find' the groups in the aggregated
dataset. If we find groups that correspond nicely with what we know,
then we can say that the clustering procedure is effective in separat-
ing the different groups. On the other hand, if the clustering procedure
fails to separate the groups, we say that it is not very effective.
Unfortunately, real life situations do not allow the luxury of this kind
of verification.
D P ( X i C , X j C ) = tr ( X i C T X i C ) + tr ( X j C T X j C )
- 2 tr ( X i C X j C T X j C X i C T ) 1/ 2
As before, we describe the example analysis in terms of the
Unweighted Procrustes distance. The procedure is repeated for all
other dissimilarity measures considered.
Let X 1 , X 2 ,...,X N denote the landmark coordinate matrices corre-
sponding to the n individuals in the combined sample.
STEP 1: Compute the dissimilarity measure between all pairs
of individuals. That is, compute
[
] i = 1, 2, º , n ; j = 1, 2, º , n
D = D P ( X i C , X C )
for i = 1,2,..., n and j = 1,2,..., n .
STEP 2: Put these dissimilarity measures into matrix form
and call that matrix the dissimilarity matrix:
This is a square, symmetric matrix with diagonal elements
equal to zero, since the dissimilarity between an individual
and itself is zero.
Search WWH ::




Custom Search