Information Technology Reference
In-Depth Information
The two last columns are the results of ensemble methods which uses only a subset
of primary clusters. The first one shows the results of an ensemble which uses the
traditional NMI based stability for cluster validation. The second one is the proposed
clustering ensemble method which uses the APMM criterion for cluster validation.
Accumulating the selected clusters in co-association matrix in both methods is done
using the proposed EEAC method. Finally, the single linkage algorithm is applied
over co-association matrix to extract final clusters. The primary clustering results are
provided similar to the full ensemble. In these methods, the threshold which is used to
cluster selection is determined adaptively. In other words, first it is adjusted to 95%. If
less than 10% of samples are absent in the selected clusters, the threshold is reduced
to 90%. This procedure (reducing 5% from threshold value) continues until the
selected clusters include more than 90% of samples. The results of the two last
columns show that although these approaches use a subset of the primary clusters,
they usually outperform the full ensemble. Also, comparing the last two columns
shows the power of APMM based stability in comparison with the NMI based
stability. Examinations by 10 independent run over different data sets robustly show
the quality of the APMM criterion with respect to NMI.
5 Conclusions
In this paper a new clustering ensemble method is proposed which is based on a
subset of total primary spurious clusters. Since the quality of the primary clusters are
not equal and presence of some of them can even yield to lower performance, here a
method to select a subset of more effective clusters is proposed. A common cluster
validity criterion which is needed to derive this subset is based on normalized mutual
information. In this paper some drawbacks of this criterion is discussed and an
alternative criterion is suggested which is named Alizadeh-Parvin-Moshki-Minaei,
APMM. The experiments show that the APMM criterion does slightly better than
NMI criterion generally; however it significantly outperforms the NMI criterion in the
case of synthetic data sets. Because of the symmetry which is concealed in NMI
criterion and also in NMI based stability, it yields to lower performance whenever
symmetry is also appeared in the data set. Another innovation of this paper is a
method for constructing the co-association matrix where some of clusters and
respectively some of samples do not exist in partitions. This new method is called
Extended Evidence Accumulation Clustering, EEAC. The empirical studies over
several data sets robustly show that the quality of the proposed method is usually
better than other ones.
References
1. Ayad, H.G., Kamel, M.S.: Cumulative Voting Consensus Method for Partitions with a
Variable Number of Clusters. IEEE Trans. on Pattern Analysis and Machine
Intelligence 30(1), 160-173 (2008)
2. Baumgartner, R., Somorjai, R., Summers, R., Richter, W., Ryner, L., Jarmasz, M.:
Resampling as a Cluster Validation Technique in fMRI. Journal of Magnetic Resonance
Imaging 11, 228-231 (2000)
3. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in
clustered data. Pasific Symposium on Biocomputing 7, 6-17 (2002)
Search WWH ::




Custom Search