Statistical Clustering Analysis: An Introduction - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

greater than 1, which means it can not give correct answer when there is only one

cluster in the dataset.

This problem is solved by adding a dummy dimension to the original stan-

dardized dataset and clone the original dataset in the space with the dummy di-

mension [33] such that the augmented dataset has at least two clusters. The aug-

mented dataset is denoted as X Std ,whichis:

X Std = X Std , 0

(5.28)

X Std , d

where 0 is an N

1 vector with all

elements d . They call this method as scale-based with dummy dimension (SBDD)

method.

We can apply the scale-based method on this augmented dataset X Std .Inthis

way, the augmented dataset has at least two clusters. So, we can compare whether

the number of clusters 2 survives in the longest range of the scale parameter or

any other number does. The number of clusters in the original dataset X Std is just

the number of clusters identified in X Std divided by 2.

×

1 zero column vector, and d is another N

×

Fig. 5.12.

One cluster encircled by another one.

There is one user-specified parameter d in the SBDD method. The value of

d is suggested to start with a small value of, such as d =2. With each value of

d , the augmented dataset is constructed by Eq. 5.28. Scale-based method is ap-

plied on X Std . If the scale-based method identifies clusters whose centers have the

following pattern:

( x 1 , 0) , ( x 2 , 0) ,..., ( x K , 0)

( x 1 , d ) , ( x 2 , d ) ,..., ( x K , d )

(5.29)

the SBDD algorithm stops. Otherwise, increase d by a step size such as ∆ d =0 . 5.

Search WWH ::

Custom Search

Home