Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

criteria for judging the quality of partitions. Traditional partitioning methods can

be extended for subspace clustering, rather than searching the full data space. This is

useful when there are many attributes and the data are sparse.

Achieving global optimality in partitioning-based clustering is often computation-

ally prohibitive, potentially requiring an exhaustive enumeration of all the possible

partitions. Instead, most applications adopt popular heuristic methods, such as

greedy approaches like the k-means and the k-medoids algorithms, which progres-

sively improve the clustering quality and approach a local optimum. These heuristic

clustering methods work well for finding spherical-shaped clusters in small- to

medium-size databases. To find clusters with complex shapes and for very large data

sets, partitioning-based methods need to be extended. Partitioning-based clustering

methods are studied in depth in Section 10.2.

Hierarchical methods: A hierarchical method creates a hierarchical decomposition of

the given set of data objects. A hierarchical method can be classified as being either

agglomerative or divisive , based on how the hierarchical decomposition is formed.

The agglomerative approach , also called the bottom-up approach, starts with each

object forming a separate group. It successively merges the objects or groups close

to one another, until all the groups are merged into one (the topmost level of the

hierarchy), or a termination condition holds. The divisive approach , also called the

top-down approach, starts with all the objects in the same cluster. In each successive

iteration, a cluster is split into smaller clusters, until eventually each object is in one

cluster, or a termination condition holds.

Hierarchical clustering methods can be distance-based or density- and continuity-

based. Various extensions of hierarchical methods consider clustering in subspaces

as well.

Hierarchical methods suffer from the fact that once a step (merge or split) is done,

it can never be undone. This rigidity is useful in that it leads to smaller computa-

tion costs by not having to worry about a combinatorial number of different choices.

Such techniques cannot correct erroneous decisions; however, methods for improv-

ing the quality of hierarchical clustering have been proposed. Hierarchical clustering

methods are studied in Section 10.3.

Density-based methods: Most partitioning methods cluster objects based on the dis-

tance between objects. Such methods can find only spherical-shaped clusters and

encounter difficulty in discovering clusters of arbitrary shapes. Other clustering

methods have been developed based on the notion of density . Their general idea

is to continue growing a given cluster as long as the density (number of objects or

data points) in the “neighborhood” exceeds some threshold. For example, for each

data point within a given cluster, the neighborhood of a given radius has to contain

at least a minimum number of points. Such a method can be used to filter out noise

or outliers and discover clusters of arbitrary shape.

Density-based methods can divide a set of objects into multiple exclusive clus-

ters, or a hierarchy of clusters. Typically, density-based methods consider exclusive

clusters only, and do not consider fuzzy clusters. Moreover, density-based methods

can be extended from full space to subspace clustering.

Density-based clustering

methods are studied in Section 10.4.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home