Information Technology Reference
In-Depth Information
Fig. 16.2.
An illustration of a bi-dimensional dataset
Note that calculating
VI
for all the possible mergings between
k
clusters will
lead to a high complexity of
O
(
k
2
)at
each level
of the process. We overcome
this by considering, at a given level, only the
M
closest pairs of clusters
5
,since
they form the most potential candidates to improve
VI
.
Context Space Composition.
For each new cluster candidate
C
p
,aContext
Risk
CR
expresses how risky can be assessing
C
p
for the overall clustering quality
in the expected upcoming mergings given the context of
C
p
. Consider the two
new clusters candidates
C
p
and
C
q
depicted by their centroids respectively in
Figures 16.3 and 16.4 with five context clusters each (
C
1
...C
5
). We assume that
C
p
,withits
K
-NN (
K
= 5) neither too close nor too distant from its centroid,
is more risky than
C
q
,withits
K
-NN either too close or too distant from its
centroid. Therefore, we define a
context space
as the space including the
K
-NN
of a new cluster candidate
C
p
. Then, as shown in Figure 16.3, we decompose the
context space of
C
p
into three layers that we define below.
Intra layer.
Clusters within this layer reduce
CR
as they should not lead to a
quick drop in
VI
.Forthis,theyhavetobe
close
enough to
C
p
, therefore, likely to
be merged with
C
p
in next iterations without causing a significant degradation
(comparing to the previous mergings) in the global intra-cluster compactness. As
a matter of fact, the clusters are getting larger over mergings, and thus the intra-
cluster is continuously deteriorating. At a level
k
where
FD
did not occurred
yet, we suppose that all the previous mergings that caused degradations in the
intra-cluster are acceptable. This layer is delimited by the thresholds
t
0=0
and
t
1=
δ
(
C
p
). We define
δ
(
C
p
) as the radius of the new cluster candidate
C
p
augmented by the standard deviation of radius values obtained following the
previous mergings. A radius is the maximum distance between the centroid of
C
p
and an element within
C
p
.
δ
(
C
p
)=
radius
(
C
p
)+
Std
(
radius
(
C
n
...C
k−
1
))
5
We set
M
=10inourexperiments.