Database Reference
In-Depth Information
Figure 7.5 Three more steps of the hierarchical clustering
We can proceed to combine clusters further. We shall discuss alternative stopping rules
next.
There are several approaches we might use to stopping the clustering process.
(1) We could be told, or have a belief, about how many clusters there are in the data.
For example, if we are told that the data about dogs is taken from Chihuahuas,
Dachshunds, and Beagles, then we know to stop when there are three clusters left.
(2) We could stop combining when at some point the best combination of existing clusters
produces a cluster that is inadequate. We shall discuss various tests for the adequacy of
a cluster in Section 7.2.3 , but for an example, we could insist that any cluster have an
average distance between the centroid and its points no greater than some limit. This
approach is only sensible if we have a reason to believe that no cluster extends over
too much of the space.
(3) We could continue clustering until there is only one cluster. However, it is meaningless
to return a single cluster consisting of all the points. Rather, we return the tree rep-
resenting the way in which all the points were combined. This form of answer makes
good sense in some applications, such as one in which the points are genomes of dif-
ferent species, and the distance measure reflects the difference in the genome. 2 Then,
the tree represents the evolution of these species, that is, the likely order in which two
species branched from a common ancestor.
EXAMPLE 7.3 If we complete the clustering of the data of Fig. 7.2 , the tree describing how
clusters were grouped is the tree shown in Fig. 7.6 .
Figure 7.6 Tree showing the complete grouping of the points of Fig. 7.2
Search WWH ::




Custom Search