Databases Reference
In-Depth Information
a road network, a vector space, or any other space. In other methods, the similarity
may be defined by connectivity based on density or contiguity, and may not rely on
the absolute distance between two objects. Similarity measures play a fundamental
role in the design of clustering methods. While distance-based methods can often
take advantage of optimization techniques, density- and continuity-based methods
can often find clusters of arbitrary shape.
Clustering space : Many clustering methods search for clusters within the entire given
data space. These methods are useful for low-dimensionality data sets. With high-
dimensional data, however, there can be many irrelevant attributes, which can make
similarity measurements unreliable. Consequently, clusters found in the full space
are often meaningless. It's often better to instead search for clusters within different
subspaces of the same data set. Subspace clustering discovers clusters and subspaces
(often of low dimensionality) that manifest object similarity.
To conclude, clustering algorithms have several requirements. These factors include
scalability and the ability to deal with different types of attributes, noisy data, incremen-
tal updates, clusters of arbitrary shape, and constraints. Interpretability and usability are
also important. In addition, clustering methods can differ with respect to the partition-
ing level, whether or not clusters are mutually exclusive, the similarity measures used,
and whether or not subspace clustering is performed.
10.1.3 OverviewofBasicClusteringMethods
There are many clustering algorithms in the literature. It is difficult to provide a crisp
categorization of clustering methods because these categories may overlap so that a
method may have features from several categories. Nevertheless, it is useful to present
a relatively organized picture of clustering methods. In general, the major fundamental
clustering methods can be classified into the following categories, which are discussed
in the rest of this chapter.
Partitioning methods: Given a set of n objects, a partitioning method constructs k
partitions of the data, where each partition represents a cluster and k n . That is, it
divides the data into k groups such that each group must contain at least one object.
In other words, partitioning methods conduct one-level partitioning on data sets.
The basic partitioning methods typically adopt exclusive cluster separation . That is,
each object must belong to exactly one group. This requirement may be relaxed, for
example, in fuzzy partitioning techniques. References to such techniques are given in
the bibliographic notes (Section 10.9).
Most partitioning methods are distance-based. Given k , the number of partitions
to construct, a partitioning method creates an initial partitioning. It then uses an
iterative relocation technique that attempts to improve the partitioning by moving
objects from one group to another. The general criterion of a good partitioning is
that objects in the same cluster are “close” or related to each other, whereas objects
in different clusters are “far apart” or very different. There are various kinds of other
 
Search WWH ::




Custom Search