A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

comes to clustering, the distributions of points along different dimensions also

matter. For example, the distance of points x 1 and x 2 in dimension i and j is 20

and 200 respectively, but that doesn't necessarily mean that x 1 and x 2 are closer

in dimension i than they are in dimension j since data points in dimension i and

j may have different distributions. Therefore, there is a need to normalize the

distance along each dimension. We want to find a normalization factor n i for each

dimension, and the modified Manhattan segmental distance between x 1 and x 2

relative to dimension set D can be defined as: ( i∈D |

p 1 ,i −

p 2 ,i |

/n i ) /

. In our

algorithm, we use the standard deviation of all points in a dataset along dimension

i as the normalization factor n i .

9.3.2. Initialization Phase

In the initialization phase, all data points are first chosen by random to form a

random data sample set S with size A

k ,where A is a constant. Then S is

chosen by a greedy algorithm to obtain an even smaller set of points M with size

k ,where B is a small constant. The greedy algorithm is based on the concept

that we avoid choosing the medoids from the same cluster; therefore, we choose

the set of points which are most far apart. The greedy algorithm has been proposed

in [8] and is illustrated in Algorithm 9.2.

Algorithm

9.2. Greedy(Set

points:

S ,

Number

medoids:

k )

{

d (.,.) is the distance function

}

begin

M =

{

m 1 }{

m 1 is a random point of S

}

{

compute distance between each point and medoid m 1 }

for each x

M do

dist ( x )= d ( x,m 1 )

end for

for i =2to k do

{

∈

choose medoid m i to be far from previous medoids

}

let m i ∈

M be s. t. dist ( m i )= max

{

dist ( x )

∈

}

M = M

∪{

m i }

{

compute distance of each point with closest medoid

}

for each x

M do

dist ( x )= min

∈

{

dist ( x ) ,d ( x,m i )

}

end for

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home