Statistical Clustering Analysis: An Introduction - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

number such as 0.2, i.e., λ t =0 . 2. Let the step size of the scale parameter

to be a small number such as 0.1, i.e., ∆ λ =0 . 1. Assume the number of

clusters K t to be larger than the reasonable maximum number of clusters in

the dataset. For instance, if we think that the number of clusters in the dataset

can not exceeds 8, we let K t =9;

(1) Cluster the dataset into K t clusters. Users can use any clustering algorithm

which utilizes the number of clusters as an input parameter, such as K -means

and hierarchical clustering algorithms. Calculate cluster centers;

(2) If any two cluster centers are closer than the scale parameter λ t , combine these

two clusters into one cluster.

(3) Increase λ t by one step size, i.e., λ t +1 = λ t +∆ λ .Let t = t +1. Update K t

with the number of remaining clusters.

(4) If K t =1, stop; otherwise go to (1).

After we run the above algorithm, we can plot K t against λ t . Figure 5.11 is an

example of this plot. We choose the K t (not including K 0 ) surviving in the longest

range of λ t as the number of clusters K ∗ . In Fig. 5.11, we choose K ∗ =3.

Fig. 5.11.

K t vs. λ t plot of scale-based method.

One advantage of the scale-based method over the model-based method is that

it is capable of giving the correct number of clusters when non-convex clusters

exist. For example, the scale-based method can correctly identify 3 clusters in the

dataset shown in Fig. 5.9.

The disadvantage of the scale-based method is that the algorithm stops when

one cluster is reached, as shown in step (4). Thus, it always concludes on a number

Search WWH ::

Custom Search

Home