Biology Reference
In-Depth Information
the two algorithms have compatible accuracy. This can be explained by the use
of modified Manhattan segmental distance in IPROCLUS. Modified Manhattan
segmental distance can effectively deal with scaled data while for unscaled data,
it is quite close to Manhattan segmental distance which is used in PROCLUS.
When comparing the three cases together, it can be seen that the no-common-
dimensions case has the lowest accuracy rate and the all-same-dimensions case
has the highest accuracy rate. It can be explained by the difference in the average
number of dimensions in a cluster. When the average number of dimensions is low
(in no-common-dimensions, l =4), there is a higher probability that data points are
assigned to the wrong cluster since points are just correlated on 4 dimensions.
However, in the all-same-dimensions case, where l =10, it is easier to correctly
cluster points since the correlation between data points is stronger.
Second, we present the result of testing the dependence on l . The dependence
is evaluated by the least square error of the number of dimensions.
The same
datasets in the three cases as in the accuracy test are used.
Figure 9.4 shows the results we get for the unscaled dataset in the random case.
We get similar results for the extreme cases. We can see that IPROCLUS has less
dependence on l than PROCLUS in terms of the number of dimensions. For the
scaled data in the three cases, we have got similar trend and we don'tgivethe
figures here since PROCLUS has much higher error rate for the scaled datasets.
Fig. 9.4.
Dependence of the Number of Dimensions on l
In summary, IPROCLUS greatly reduces the dependence on parameter l ,since
the dimension tuning process checks for additional dimensions for each cluster in
order to add any dimension that can enable better clustering, while PROCLUS
decide the average number of dimensions solely based on l .
For the running time test, we apply IPROCLUS and PROCLUS on different
number of points. The datasets are generated in the same way as the datasets
used in the previous two tests. The result we get is that the execution time of
these two algorithms is comparable for all the three different cases.
Since the
Search WWH ::




Custom Search