Biology Reference
In-Depth Information
number such as 0.2, i.e., λ t =0 . 2. Let the step size of the scale parameter
to be a small number such as 0.1, i.e., ∆ λ =0 . 1. Assume the number of
clusters K t to be larger than the reasonable maximum number of clusters in
the dataset. For instance, if we think that the number of clusters in the dataset
can not exceeds 8, we let K t =9;
(1) Cluster the dataset into K t clusters. Users can use any clustering algorithm
which utilizes the number of clusters as an input parameter, such as K -means
and hierarchical clustering algorithms. Calculate cluster centers;
(2) If any two cluster centers are closer than the scale parameter λ t , combine these
two clusters into one cluster.
(3) Increase λ t by one step size, i.e., λ t +1 = λ t +∆ λ .Let t = t +1. Update K t
with the number of remaining clusters.
(4) If K t =1, stop; otherwise go to (1).
After we run the above algorithm, we can plot K t against λ t . Figure 5.11 is an
example of this plot. We choose the K t (not including K 0 ) surviving in the longest
range of λ t as the number of clusters K . In Fig. 5.11, we choose K =3.
Fig. 5.11.
K t vs. λ t plot of scale-based method.
One advantage of the scale-based method over the model-based method is that
it is capable of giving the correct number of clusters when non-convex clusters
exist. For example, the scale-based method can correctly identify 3 clusters in the
dataset shown in Fig. 5.9.
The disadvantage of the scale-based method is that the algorithm stops when
one cluster is reached, as shown in step (4). Thus, it always concludes on a number
Search WWH ::




Custom Search