Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

you may organize your employees into major groups such as executives, managers, and

staff. You can further partition these groups into smaller subgroups. For instance, the

general group of staff can be further divided into subgroups of senior officers, officers,

and trainees. All these groups form a hierarchy. We can easily summarize or characterize

the data that are organized into a hierarchy, which can be used to find, say, the average

salary of managers and of officers.

Consider handwritten character recognition as another example. A set of handwrit-

ing samples may be first partitioned into general groups where each group corresponds

to a unique character. Some groups can be further partitioned into subgroups since

a character may be written in multiple substantially different ways. If necessary, the

hierarchical partitioning can be continued recursively until a desired granularity is

reached.

In the previous examples, although we partitioned the data hierarchically, we did not

assume that the data have a hierarchical structure (e.g., managers are at the same level

in our AllElectronics hierarchy as staff). Our use of a hierarchy here is just to summarize

and represent the underlying data in a compressed way. Such a hierarchy is particularly

useful for data visualization.

Alternatively, in some applications we may believe that the data bear an underly-

ing hierarchical structure that we want to discover. For example, hierarchical clustering

may uncover a hierarchy for AllElectronics employees structured on, say, salary. In the

study of evolution, hierarchical clustering may group animals according to their bio-

logical features to uncover evolutionary paths, which are a hierarchy of species. As

another example, grouping configurations of a strategic game (e.g., chess or checkers) in

a hierarchical way may help to develop game strategies that can be used to train players.

In this section, you will study hierarchical clustering methods. Section 10.3.1 begins

with a discussion of agglomerative versus divisive hierarchical clustering, which organize

objects into a hierarchy using a bottom-up or top-down strategy, respectively. Agglo-

merative methods start with individual objects as clusters, which are iteratively merged

to form larger clusters. Conversely, divisive methods initially let all the given objects

form one cluster, which they iteratively split into smaller clusters.

Hierarchical clustering methods can encounter difficulties regarding the selection

of merge or split points. Such a decision is critical, because once a group of objects is

merged or split, the process at the next step will operate on the newly generated clusters.

It will neither undo what was done previously, nor perform object swapping between

clusters. Thus, merge or split decisions, if not well chosen, may lead to low-quality

clusters. Moreover, the methods do not scale well because each decision of merge or

split needs to examine and evaluate many objects or clusters.

A promising direction for improving the clustering quality of hierarchical meth-

ods is to integrate hierarchical clustering with other clustering techniques, resulting in

multiple-phase (or multiphase ) clustering . We introduce two such methods, namely

BIRCH and Chameleon. BIRCH (Section 10.3.3) begins by partitioning objects hierar-

chically using tree structures, where the leaf or low-level nonleaf nodes can be

viewed as “microclusters” depending on the resolution scale. It then applies other

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home