Databases Reference
In-Depth Information
insensitivity to input order, the capability of clustering high-dimensionality data,
constraint-based clustering, as well as interpretability and usability.
Many clustering algorithms have been developed. These can be categorized from
several
orthogonal aspects
such as those regarding partitioning criteria, separation
of clusters, similarity measures used, and clustering space. This chapter discusses
major fundamental clustering methods of the following categories:
partitioning
methods, hierarchical methods, density-based methods
, and
grid-based methods
. Some
algorithms may belong to more than one category.
A
partitioning method
first creates an initial set of
k
partitions, where parame-
ter
k
is the number of partitions to construct. It then uses an
iterative relocation
technique
that attempts to improve the partitioning by moving objects from one
group to another. Typical partitioning methods include
k
-means,
k
-medoids, and
CLARANS.
A
hierarchical method
creates a hierarchical decomposition of the given set of data
objects. The method can be classified as being either
agglomerative
(
bottom-up
) or
divisive
(
top-down
), based on how the hierarchical decomposition is formed. To
compensate for the rigidity of
merge
or
split
, the quality of hierarchical agglome-
ration can be improved by analyzing object linkages at each hierarchical partitioning
(e.g., in Chameleon), or by first performing
microclustering
(that is, grouping objects
into “microclusters”) and then operating on the microclusters with other clustering
techniques such as iterative relocation (as in BIRCH).
A
density-based method
clusters objects based on the notion of density. It grows
clusters either according to the density of neighborhood objects (e.g., in DBSCAN)
or according to a density function (e.g., in DENCLUE). OPTICS is a density-based
method that generates an augmented ordering of the data's clustering structure.
A
grid-based method
first quantizes the object space into a finite number of cells that
form a grid structure, and then performs clustering on the grid structure. STING is
a typical example of a grid-based method based on statistical information stored in
grid cells. CLIQUE is a grid-based and subspace clustering algorithm.
Clustering evaluation
assesses the feasibility of clustering analysis on a data set and
the quality of the results generated by a clustering method. The tasks include assessing
clustering tendency, determining the number of clusters, and measuring clustering
quality.
10.8
Exercises
10.1
Briefly describe and give examples of each of the following approaches to cluster-
ing:
partitioning
methods,
hierarchical
methods,
density-based
methods, and
grid-based
methods.