An Asymmetric Criterion for Cluster Validation - Developing Concepts in Applied Intelligence

Information Technology Reference

In-Depth Information

An Asymmetric Criterion for Cluster Validation

Hosein Alizadeh, Behrouz Minaei, Hamid Parvin, Mohsen Moshki

School of Computer Engineering, Iran University of Science and Technology (IUST),

Tehran, Iran

{halizadeh,b_minaei,parvin,moshki}@iust.ac.ir

Abstract. Many stability measures to validate a cluster have been proposed

such as Normalized Mutual Information. The drawback of the common

approach is discussed in this paper and then a new asymmetric criterion is

proposed to assess the association between a cluster and a partition which is

called Alizadeh-Parvin-Moshki-Minaei criterion, APMM. The APMM criterion

compensates the drawback of the common Normalized Mutual Information

(NMI) measure. Also, a clustering ensemble method is proposed which is based

on aggregating a subset of primary clusters. This method uses the Average

APMM as fitness measure to select a number of clusters. The clusters which

satisfy a predefined threshold of the mentioned measure are selected to

participate in the clustering ensemble. To combine the chosen clusters, a co-

association based consensus function is employed. Since the Evidence

Accumulation Clustering, EAC, method cannot derive the co-association matrix

from a subset of clusters, a new EAC based method which is called Extended

EAC, EEAC, is employed to construct the co-association matrix from the

chosen subset of clusters. The empirical studies show that the proposed method

outperforms other ones.

Keywords: Clustering Ensemble, Stability Measure, Improved Stability,

Evidence Accumulation, Extended EAC, Co-association Matrix, Cluster

Evaluation.

1 Introduction

Data clustering or unsupervised learning is an important and very challenging

problem. The objective of clustering is to partition a set of unlabeled objects into

homogeneous groups or clusters [10]. There are many applications which use

clustering techniques to discover structures in data, such as data mining [10],

information retrieval, image segmentation, and machine learning [12]. In real-world

problems, the clusters can appear with different shapes, sizes, data sparseness, and

degrees of separation. Clustering techniques require the definition of a similarity

measure between patterns. On account of lack of any prior knowledge about cluster

shapes, choosing a specialized clustering method is not easy [24]. Studies in the last

few years have tended to combinational methods. Cluster ensemble methods attempt

to find a better and more robust clustering solution by fusing information from several

primary data partitionings [1].

Search WWH ::

Custom Search

Home