Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

insensitivity to input order, the capability of clustering high-dimensionality data,

constraint-based clustering, as well as interpretability and usability.

Many clustering algorithms have been developed. These can be categorized from

several orthogonal aspects such as those regarding partitioning criteria, separation

of clusters, similarity measures used, and clustering space. This chapter discusses

major fundamental clustering methods of the following categories: partitioning

methods, hierarchical methods, density-based methods , and grid-based methods . Some

algorithms may belong to more than one category.

A partitioning method first creates an initial set of k partitions, where parame-

ter k is the number of partitions to construct. It then uses an iterative relocation

technique that attempts to improve the partitioning by moving objects from one

group to another. Typical partitioning methods include k -means, k -medoids, and

CLARANS.

A hierarchical method creates a hierarchical decomposition of the given set of data

objects. The method can be classified as being either agglomerative ( bottom-up ) or

divisive ( top-down ), based on how the hierarchical decomposition is formed. To

compensate for the rigidity of merge or split , the quality of hierarchical agglome-

ration can be improved by analyzing object linkages at each hierarchical partitioning

(e.g., in Chameleon), or by first performing microclustering (that is, grouping objects

into “microclusters”) and then operating on the microclusters with other clustering

techniques such as iterative relocation (as in BIRCH).

A density-based method clusters objects based on the notion of density. It grows

clusters either according to the density of neighborhood objects (e.g., in DBSCAN)

or according to a density function (e.g., in DENCLUE). OPTICS is a density-based

method that generates an augmented ordering of the data's clustering structure.

A grid-based method first quantizes the object space into a finite number of cells that

form a grid structure, and then performs clustering on the grid structure. STING is

a typical example of a grid-based method based on statistical information stored in

grid cells. CLIQUE is a grid-based and subspace clustering algorithm.

Clustering evaluation assesses the feasibility of clustering analysis on a data set and

the quality of the results generated by a clustering method. The tasks include assessing

clustering tendency, determining the number of clusters, and measuring clustering

quality.

10.8 Exercises

10.1 Briefly describe and give examples of each of the following approaches to cluster-

ing: partitioning methods, hierarchical methods, density-based methods, and grid-based

methods.

Search WWH ::

Custom Search

Home