Information Technology Reference
In-Depth Information
The results presented in Table 3 consider the run time (in seconds) of the algorithms
examined on a one-off basis. This is the average time of 50 runs of the methods cal-
culated for clustering original data as well as hyperboxes. The last column of the table
contains the quotient of the values. It can be seen, that the processing of granulated
data is significantly (up to about 40 times in case of SOSIG and 14 times in case of
the remaining algorithms) faster than processing original point-type objects. The most
acceleration is visible when the number of objects in data is great and considerably
predominate the number of attributes.
Ta b l e 3 . Average time (in seconds) of clustering hyperboxes and point-type data
data
algorithm point-type data granulated data t pd / t gd
set
t pd
t gd
SOSIG
0.360
0.040
9
k-means
0.062
0.047
1.32
norm2D2gr
hcl
0.110
0.032
3.44
hsl
0.125
0.031
4.03
SOSIG
0.930
0.080
11.63
k-means
0.187
0.094
2.0
sph2D6gr
hcl
0.266
0.047
5.66
hsl
0.250
0.032
7.81
SOSIG
0.870, 0.800
0.790
1.01
k-means
0.141
0.125
1.13
irises
hcl
0.078
0.046
1.70
hsl
0.094
0.047
2.0
SOSIG
0.270
0.010
38.57
k-means
0.156
0.047
3.32
sph10D4gr
hcl
0.141
0.016
8.81
hsl
0.219
0.015
14.6
Comparing the results of clustering algorithms one can notice the most increased
speed for hierarchical algorithms and SOSIG. As it has been mentioned, hierarchical
algorithms arouse scientists' interest due to their better clustering ability in comparison
to less complex partitioning methods. However, their time complexity is greater. The
same applies to SOSIG. Processing granulated data in advance can be a way of enabling
them to cluster large size databases in reasonable time.
Obviously, the total time of clustering is influenced by the time of data preprocess-
ing, particularly when the algorithm of data preparation is complex. However, in the
experiments described in this paper this time is not taken into consideration for two rea-
sons. First of all, the number of objects in preparing a set is decreasing by one in every
iteration, which practically reduces the time complexity of pre-processing procedure. In
addition, in case of algorithms, which take a number of groups as an input parameter,
data should be clustered at least several times to evaluate the number of clusters present
in this data. In this case single preparation of data has significantly less importance in
comparison to multiple data clustering.
To compare results of clustering regarding the most compact and separable parti-
tioning two internal indices: DB and Dunn s have been chosen. In addition, external
 
Search WWH ::




Custom Search