Information Technology Reference
In-Depth Information
All experiments are done over the normalized features. It means each feature is
normalized with mean of 0 and variance of 1, N(0, 1). All of them are reported over
means of 10 independent runs of algorithm. The final performance of the clustering
algorithms is evaluated by re-labeling between obtained clusters and the ground truth
labels and then counting the percentage of the true classified samples. Table 2 shows
the performance of the proposed method comparing with most common base and
ensemble methods.
Table 2. Experimental results.
Simple Methods (%)
Ensemble Methods (%)
Cluster
Selection by
NMI Method
Cluster
Selection by
max Method
Single
Linkage
Average
Linkage
Complete
Linkage
Kmeans
Ensemble
Full
Ensemble
Dataset
Kmeans
Wine
37.64
38.76
83.71
96.63
96.63
97.08
97.75
98.31
Breast-C
65.15
70.13
94.73
95.37
95.46
95.10
95.75
98.33
Yeast
34.38
35.11
38.91
40.20
45.46
47.17
47.17
47.17
Glass
36.45
37.85
40.65
45.28
47.01
47.83
48.13
50.47
Bupa
57.68
57.10
55.94
54.64
54.49
55.83
58.09
58.40
The four first columns of Tab. 2 are the results of some base clustering algorithms.
The results show that although each of these algorithms can obtain a good result over
a specific data set, it does not perform well over other data sets. For example,
according to Tab. 2 the K -means algorithm has a good clustering result over Wine
data set in comparison with linkage methods. But, it has lower performance in
comparison to linkage methods in the case of Bupa data set. Also, the complete
linkage has a good performance in Breast-Cancer data set in comparison with others;
however it is not in the case of all data sets. The four last columns show the
performance of some ensemble methods in comparison with the proposed one. Taking
a glance on the last four columns in comparison with the first four columns shows that
the ensemble methods do better than simple base algorithms in the case of
performance and robustness against different data sets. The first column of the
ensemble methods is the results of an ensemble of 100 K -means which is fused by
EAC method. The 90% sampling from data set is used for creating diversity in
primary results. The sub-sampling (without replacement) is used as the sampling
method. Also the random initialization of the seed points of K -means algorithm helps
them to be more diverse. The single linkage algorithm is applied as consensus
function for deriving the final clusters from co-association matrix. The second column
from ensemble methods is the full ensemble which uses several clustering algorithms
for generating the primary results. Here, 70 K -means with the above mentioned
parameters in addition to 30 linkage methods provide the primary results. Since
different runs of a specific linkage method always yield to the same result, there is a
limitation for using them as the base clustering algorithms. Here, forcing a set of
different number of clusters, K±2 , is used to create diversity, which K is the true
number of clusters. The used linkage algorithms and their distance criterion are shown
in Tab. 3. The detailed information about each of these linkage algorithms can be
found in [8].
Search WWH ::




Custom Search