Databases Reference
In-Depth Information
Step 0
Step 1
Step 2
Step 3
Step 4
Agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
Divisive
(DIANA)
Step 4
Step 3
Step 2
Step 1
Step 0
Figure10.6 Agglomerative and divisive hierarchical clustering on data objects f a , b , c , d , e g.
Level
l
a
b
c
d
e
1.0
= 0
0.8
l
= 1
l
= 2
0.6
l
=
3
0.4
0.2
l
= 4
0.0
Figure10.7 Dendrogram representation for hierarchical clustering of data objects f a , b , c , d , e g.
different clusters. This is a single-linkage approach in that each cluster is represented
by all the objects in the cluster, and the similarity between two clusters is measured
by the similarity of the closest pair of data points belonging to different clusters. The
cluster-merging process repeats until all the objects are eventually merged to form one
cluster.
DIANA, the divisive method, proceeds in the contrasting way. All the objects are used
to form one initial cluster. The cluster is split according to some principle such as the
maximum Euclidean distance between the closest neighboring objects in the cluster. The
cluster-splitting process repeats until, eventually, each new cluster contains only a single
object.
A tree structure called a dendrogram is commonly used to represent the process of
hierarchical clustering. It shows how objects are grouped together (in an agglomerative
method) or partitioned (in a divisive method) step-by-step. Figure 10.7 shows a den-
drogram for the five objects presented in Figure 10.6, where l D 0 shows the five objects
as singleton clusters at level 0. At l D 1, objects a and b are grouped together to form the
 
Search WWH ::




Custom Search