Databases Reference
In-Depth Information
AB CD
E
J
H
G
F
(a) Data set
A B C D
E
ABCDEFGHJ
J
H
G
F
(b) Clustering using single linkage
AB CD
E
J
H
G
F
AB
HJ
EFG
CD
(c) Clustering using complete linkage
Figure10.8 Hierarchical clustering using single and complete linkages.
the clustering method achieve good speed and scalability in large or even streaming
databases, and also make it effective for incremental and dynamic clustering of incoming
objects.
Consider a cluster of n d -dimensional data objects or points. The clustering feature
( CF ) of the cluster is a 3-D vector summarizing information about clusters of objects. It
is defined as
CF Dh n , LS , SS i,
(10.7)
where LS is the linear sum of the n points (i.e., P i D1 x i ), and SS is the square sum of the
data points (i.e., P i D1 x i 2 ).
A clustering feature is essentially a summary of the statistics for the given cluster.
Using a clustering feature, we can easily derive many useful statistics of a cluster. For
example, the cluster's centroid, x 0 , radius, R , and diameter, D , are
P
x i
n D
LS
n
i D1
x 0 D
,
(10.8)
 
Search WWH ::




Custom Search