Databases Reference
In-Depth Information
insensitivity to input order, the capability of clustering high-dimensionality data,
constraint-based clustering, as well as interpretability and usability.
Many clustering algorithms have been developed. These can be categorized from
several orthogonal aspects such as those regarding partitioning criteria, separation
of clusters, similarity measures used, and clustering space. This chapter discusses
major fundamental clustering methods of the following categories: partitioning
methods, hierarchical methods, density-based methods , and grid-based methods . Some
algorithms may belong to more than one category.
A partitioning method first creates an initial set of k partitions, where parame-
ter k is the number of partitions to construct. It then uses an iterative relocation
technique that attempts to improve the partitioning by moving objects from one
group to another. Typical partitioning methods include k -means, k -medoids, and
CLARANS.
A hierarchical method creates a hierarchical decomposition of the given set of data
objects. The method can be classified as being either agglomerative ( bottom-up ) or
divisive ( top-down ), based on how the hierarchical decomposition is formed. To
compensate for the rigidity of merge or split , the quality of hierarchical agglome-
ration can be improved by analyzing object linkages at each hierarchical partitioning
(e.g., in Chameleon), or by first performing microclustering (that is, grouping objects
into “microclusters”) and then operating on the microclusters with other clustering
techniques such as iterative relocation (as in BIRCH).
A density-based method clusters objects based on the notion of density. It grows
clusters either according to the density of neighborhood objects (e.g., in DBSCAN)
or according to a density function (e.g., in DENCLUE). OPTICS is a density-based
method that generates an augmented ordering of the data's clustering structure.
A grid-based method first quantizes the object space into a finite number of cells that
form a grid structure, and then performs clustering on the grid structure. STING is
a typical example of a grid-based method based on statistical information stored in
grid cells. CLIQUE is a grid-based and subspace clustering algorithm.
Clustering evaluation assesses the feasibility of clustering analysis on a data set and
the quality of the results generated by a clustering method. The tasks include assessing
clustering tendency, determining the number of clusters, and measuring clustering
quality.
10.8 Exercises
10.1 Briefly describe and give examples of each of the following approaches to cluster-
ing: partitioning methods, hierarchical methods, density-based methods, and grid-based
methods.
 
Search WWH ::




Custom Search