Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

a road network, a vector space, or any other space. In other methods, the similarity

may be defined by connectivity based on density or contiguity, and may not rely on

the absolute distance between two objects. Similarity measures play a fundamental

role in the design of clustering methods. While distance-based methods can often

take advantage of optimization techniques, density- and continuity-based methods

can often find clusters of arbitrary shape.

Clustering space : Many clustering methods search for clusters within the entire given

data space. These methods are useful for low-dimensionality data sets. With high-

dimensional data, however, there can be many irrelevant attributes, which can make

similarity measurements unreliable. Consequently, clusters found in the full space

are often meaningless. It's often better to instead search for clusters within different

subspaces of the same data set. Subspace clustering discovers clusters and subspaces

(often of low dimensionality) that manifest object similarity.

To conclude, clustering algorithms have several requirements. These factors include

scalability and the ability to deal with different types of attributes, noisy data, incremen-

tal updates, clusters of arbitrary shape, and constraints. Interpretability and usability are

also important. In addition, clustering methods can differ with respect to the partition-

ing level, whether or not clusters are mutually exclusive, the similarity measures used,

and whether or not subspace clustering is performed.

10.1.3 OverviewofBasicClusteringMethods

There are many clustering algorithms in the literature. It is difficult to provide a crisp

categorization of clustering methods because these categories may overlap so that a

method may have features from several categories. Nevertheless, it is useful to present

a relatively organized picture of clustering methods. In general, the major fundamental

clustering methods can be classified into the following categories, which are discussed

in the rest of this chapter.

Partitioning methods: Given a set of n objects, a partitioning method constructs k

partitions of the data, where each partition represents a cluster and k n . That is, it

divides the data into k groups such that each group must contain at least one object.

In other words, partitioning methods conduct one-level partitioning on data sets.

The basic partitioning methods typically adopt exclusive cluster separation . That is,

each object must belong to exactly one group. This requirement may be relaxed, for

example, in fuzzy partitioning techniques. References to such techniques are given in

the bibliographic notes (Section 10.9).

Most partitioning methods are distance-based. Given k , the number of partitions

to construct, a partitioning method creates an initial partitioning. It then uses an

iterative relocation technique that attempts to improve the partitioning by moving

objects from one group to another. The general criterion of a good partitioning is

that objects in the same cluster are “close” or related to each other, whereas objects

in different clusters are “far apart” or very different. There are various kinds of other

Search WWH ::

Custom Search

Home