Databases Reference
In-Depth Information
you may organize your employees into major groups such as executives, managers, and
staff. You can further partition these groups into smaller subgroups. For instance, the
general group of staff can be further divided into subgroups of senior officers, officers,
and trainees. All these groups form a hierarchy. We can easily summarize or characterize
the data that are organized into a hierarchy, which can be used to find, say, the average
salary of managers and of officers.
Consider handwritten character recognition as another example. A set of handwrit-
ing samples may be first partitioned into general groups where each group corresponds
to a unique character. Some groups can be further partitioned into subgroups since
a character may be written in multiple substantially different ways. If necessary, the
hierarchical partitioning can be continued recursively until a desired granularity is
reached.
In the previous examples, although we partitioned the data hierarchically, we did not
assume that the data have a hierarchical structure (e.g., managers are at the same level
in our AllElectronics hierarchy as staff). Our use of a hierarchy here is just to summarize
and represent the underlying data in a compressed way. Such a hierarchy is particularly
useful for data visualization.
Alternatively, in some applications we may believe that the data bear an underly-
ing hierarchical structure that we want to discover. For example, hierarchical clustering
may uncover a hierarchy for AllElectronics employees structured on, say, salary. In the
study of evolution, hierarchical clustering may group animals according to their bio-
logical features to uncover evolutionary paths, which are a hierarchy of species. As
another example, grouping configurations of a strategic game (e.g., chess or checkers) in
a hierarchical way may help to develop game strategies that can be used to train players.
In this section, you will study hierarchical clustering methods. Section 10.3.1 begins
with a discussion of agglomerative versus divisive hierarchical clustering, which organize
objects into a hierarchy using a bottom-up or top-down strategy, respectively. Agglo-
merative methods start with individual objects as clusters, which are iteratively merged
to form larger clusters. Conversely, divisive methods initially let all the given objects
form one cluster, which they iteratively split into smaller clusters.
Hierarchical clustering methods can encounter difficulties regarding the selection
of merge or split points. Such a decision is critical, because once a group of objects is
merged or split, the process at the next step will operate on the newly generated clusters.
It will neither undo what was done previously, nor perform object swapping between
clusters. Thus, merge or split decisions, if not well chosen, may lead to low-quality
clusters. Moreover, the methods do not scale well because each decision of merge or
split needs to examine and evaluate many objects or clusters.
A promising direction for improving the clustering quality of hierarchical meth-
ods is to integrate hierarchical clustering with other clustering techniques, resulting in
multiple-phase (or multiphase ) clustering . We introduce two such methods, namely
BIRCH and Chameleon. BIRCH (Section 10.3.3) begins by partitioning objects hierar-
chically using tree structures, where the leaf or low-level nonleaf nodes can be
viewed as “microclusters” depending on the resolution scale. It then applies other
 
Search WWH ::




Custom Search