Biology Reference
In-Depth Information
number such as 0.2, i.e.,
λ
t
=0
.
2. Let the step size of the scale parameter
to be a small number such as 0.1, i.e., ∆
λ
=0
.
1. Assume the number of
clusters
K
t
to be larger than the reasonable maximum number of clusters in
the dataset. For instance, if we think that the number of clusters in the dataset
can not exceeds 8, we let
K
t
=9;
(1) Cluster the dataset into
K
t
clusters. Users can use any clustering algorithm
which utilizes the number of clusters as an input parameter, such as
K
-means
and hierarchical clustering algorithms. Calculate cluster centers;
(2) If any two cluster centers are closer than the scale parameter
λ
t
, combine these
two clusters into one cluster.
(3) Increase
λ
t
by one step size, i.e.,
λ
t
+1
=
λ
t
+∆
λ
.Let
t
=
t
+1. Update
K
t
with the number of remaining clusters.
(4) If
K
t
=1, stop; otherwise go to (1).
After we run the above algorithm, we can plot
K
t
against
λ
t
. Figure 5.11 is an
example of this plot. We choose the
K
t
(not including
K
0
) surviving in the longest
range of
λ
t
as the number of clusters
K
∗
. In Fig. 5.11, we choose
K
∗
=3.
Fig. 5.11.
K
t
vs.
λ
t
plot of scale-based method.
One advantage of the scale-based method over the model-based method is that
it is capable of giving the correct number of clusters when non-convex clusters
exist. For example, the scale-based method can correctly identify 3 clusters in the
dataset shown in Fig. 5.9.
The disadvantage of the scale-based method is that the algorithm stops when
one cluster is reached, as shown in step (4). Thus, it always concludes on a number
Search WWH ::
Custom Search