Information Technology Reference
In-Depth Information
An Asymmetric Criterion for Cluster Validation
Hosein Alizadeh, Behrouz Minaei, Hamid Parvin, Mohsen Moshki
School of Computer Engineering, Iran University of Science and Technology (IUST),
Tehran, Iran
{halizadeh,b_minaei,parvin,moshki}@iust.ac.ir
Abstract. Many stability measures to validate a cluster have been proposed
such as Normalized Mutual Information. The drawback of the common
approach is discussed in this paper and then a new asymmetric criterion is
proposed to assess the association between a cluster and a partition which is
called Alizadeh-Parvin-Moshki-Minaei criterion, APMM. The APMM criterion
compensates the drawback of the common Normalized Mutual Information
(NMI) measure. Also, a clustering ensemble method is proposed which is based
on aggregating a subset of primary clusters. This method uses the Average
APMM as fitness measure to select a number of clusters. The clusters which
satisfy a predefined threshold of the mentioned measure are selected to
participate in the clustering ensemble. To combine the chosen clusters, a co-
association based consensus function is employed. Since the Evidence
Accumulation Clustering, EAC, method cannot derive the co-association matrix
from a subset of clusters, a new EAC based method which is called Extended
EAC, EEAC, is employed to construct the co-association matrix from the
chosen subset of clusters. The empirical studies show that the proposed method
outperforms other ones.
Keywords: Clustering Ensemble, Stability Measure, Improved Stability,
Evidence Accumulation, Extended EAC, Co-association Matrix, Cluster
Evaluation.
1 Introduction
Data clustering or unsupervised learning is an important and very challenging
problem. The objective of clustering is to partition a set of unlabeled objects into
homogeneous groups or clusters [10]. There are many applications which use
clustering techniques to discover structures in data, such as data mining [10],
information retrieval, image segmentation, and machine learning [12]. In real-world
problems, the clusters can appear with different shapes, sizes, data sparseness, and
degrees of separation. Clustering techniques require the definition of a similarity
measure between patterns. On account of lack of any prior knowledge about cluster
shapes, choosing a specialized clustering method is not easy [24]. Studies in the last
few years have tended to combinational methods. Cluster ensemble methods attempt
to find a better and more robust clustering solution by fusing information from several
primary data partitionings [1].
 
Search WWH ::




Custom Search