A Projected Clustering Algorithm and Its Biomedical Application - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

( C 1 ,..., C k )= ( C 1 ,..., C k )

isGood=true

for each cluster C i do

update bestEvaluateValue

end for

else

remove dimension j from D i

isGood=false

end if

until isGood=false

end for

end

Outliers are also handled during the last pass over the data. For each medoid

m i with the dimensions D i ,wefind the smallest Manhattan segmental distance

∆ i to any of the other ( k

−

1) medoids with respect to the set of dimensions D i :

j = i d D i ( m i ,m j )

∆ i is the sphere of influence of the medoid m i . A data point is an outlier if its

Manhattan segmental distance to each medoid m i , relative to the set of dimensions

D i exceeds ∆ i .

∆ i =min

9.4. Empirical Results

The experimental evaluation was performed on a Dell Dimension 4600 Intel Pen-

tium IV processor 2.4GHz with 1.00GB of memory, running Windows XP pro-

fessional with service pack 2. The data was stored on a 7200RPM, 8MB cache,

80G hard drive. The flow chart of the experimental evaluation for a dataset is

illustrated in Fig. 9.3.

We test the performance of IPROCLUS and PROCLUS for synthetic data and

real biomedical data. Unless otherwise specified, all the results are obtained by

running the algorithms on the datasets multiple times and taking the average. Each

time a random seed is chosen and the two algorithms are both fed with this seed for

the random generator to guarantee fair comparison. We discuss the generation of

the synthetic datasets in Subsection 9.4.1. Then we compare our empirical results

of running IPROCLUS and PROCLUS on synthetic datasets in Subsection 9.4.2.

The performance of IPROCLUS and PROCLUS on a real biomedical dataset, the

colon tumor dataset, is analyzed in Subsection 9.4.3.

Search WWH ::

Custom Search

Home