Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

6.4.2 Overview of Clustering Algorithms

We present a brief overview of some clustering algorithms, specially those

used in the comparative experiments presented later. Further details on al-

gorithms, techniques and applications, can be found in [247, 112, 113, 114, 25,

179, 225]).

From the huge variety of clustering algorithms, the hierarchical agglomer-

ative ones are probably the most used even today. Hierarchical agglomerative

algorithms create, by definition, a hierarchy of clusters from the dataset. They

start by assigning each instance to a single cluster and then, based on dissimi-

larity measures, proceed to merge small clusters into larger ones in a stepwise

manner. The process ends when all instances in the dataset are members of

a single cluster. The resulting hierarchical tree defines the clustering levels.

Examples of hierarchical clustering algorithms are CURE [85], ROCK [86],

AGNES [128], BIRCH [253, 254] and Chameleon [127].

Other clustering algorithms like those based on graphs and graph theory

were proposed in the early 1970's. They use the high connectivity in similarity

graphs to perform clustering [154,155]. These algorithms are usually divisive

algorithms, meaning that they first consider a highly connected graph (that

corresponds to a single cluster) containing all elements of the dataset and

then start doing consecutive cuts in the graph to obtain smaller clusters. A

cut in a graph corresponds to the removal of a set of edges that disconnects

the graph. A minimum cut (min-cut) is the removal of the smallest number of

edges that produces a cut. The result of a cut in the graph causes the splitting

of one cluster into, at least, two clusters. Examples of graph-based clustering

algorithms can be found in [120, 92, 93, 244]. Chameleon [127], mentioned

earlier, also uses a graph-theoretic approach.

Graph cutting is also used in spectral clustering, commonly applied in

image segmentation and, more recently, in web page and document clustering

and also in bioinformatics. The rationale of spectral clustering is to use the

special properties of the eigenvectors of a Laplacian matrix as the basis to

perform clustering. Fidler [73] was one of the first to show the application

of eigenvectors to graph partitioning. The Laplacian matrix is based on an

anity matrix built with a similarity measure. The most common similarity

measure used in spectral clustering is exp −d ij / 2 σ 2 ,where d ij is the Euclidian

distance between data vectors x i and x j and σ is a scaling parameter.

Several spectral clustering algorithms differ in the way they use the eigen-

vectors in order to perform clustering. Some researchers use the eigenvec-

tors of the "normalized" Laplacian matrix [42] (or a similar one) in or-

der to perform the cutting, usually using the second smallest eigenvec-

tor [209, 124, 55]. Others, use the highest eigenvectors as input to other

clustering algorithm [166, 157]. A comparison of several spectral clustering

algorithms can be found in [229].

Other clustering algorithms use the different density regions of the data to

perform clustering. Examples of these density-based clustering algorithms are

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home