Biology Reference
In-Depth Information
( C 1 ,..., C k )= ( C 1 ,..., C k )
isGood=true
for each cluster C i do
update bestEvaluateValue
end for
else
remove dimension j from D i
isGood=false
end if
until isGood=false
end for
end for
end
Outliers are also handled during the last pass over the data. For each medoid
m i with the dimensions D i ,wefind the smallest Manhattan segmental distance
i to any of the other ( k
1) medoids with respect to the set of dimensions D i :
j = i d D i ( m i ,m j )
i is the sphere of influence of the medoid m i . A data point is an outlier if its
Manhattan segmental distance to each medoid m i , relative to the set of dimensions
D i exceeds ∆ i .
i =min
9.4. Empirical Results
The experimental evaluation was performed on a Dell Dimension 4600 Intel Pen-
tium IV processor 2.4GHz with 1.00GB of memory, running Windows XP pro-
fessional with service pack 2. The data was stored on a 7200RPM, 8MB cache,
80G hard drive. The flow chart of the experimental evaluation for a dataset is
illustrated in Fig. 9.3.
We test the performance of IPROCLUS and PROCLUS for synthetic data and
real biomedical data. Unless otherwise specified, all the results are obtained by
running the algorithms on the datasets multiple times and taking the average. Each
time a random seed is chosen and the two algorithms are both fed with this seed for
the random generator to guarantee fair comparison. We discuss the generation of
the synthetic datasets in Subsection 9.4.1. Then we compare our empirical results
of running IPROCLUS and PROCLUS on synthetic datasets in Subsection 9.4.2.
The performance of IPROCLUS and PROCLUS on a real biomedical dataset, the
colon tumor dataset, is analyzed in Subsection 9.4.3.
Search WWH ::




Custom Search