Information Technology Reference
In-Depth Information
Fig. 16.2. An illustration of a bi-dimensional dataset
Note that calculating VI for all the possible mergings between k clusters will
lead to a high complexity of O ( k 2 )at each level of the process. We overcome
this by considering, at a given level, only the M closest pairs of clusters 5 ,since
they form the most potential candidates to improve VI .
Context Space Composition. For each new cluster candidate C p ,aContext
Risk CR expresses how risky can be assessing C p for the overall clustering quality
in the expected upcoming mergings given the context of C p . Consider the two
new clusters candidates C p and C q depicted by their centroids respectively in
Figures 16.3 and 16.4 with five context clusters each ( C 1 ...C 5 ). We assume that
C p ,withits K -NN ( K = 5) neither too close nor too distant from its centroid,
is more risky than C q ,withits K -NN either too close or too distant from its
centroid. Therefore, we define a context space as the space including the K -NN
of a new cluster candidate C p . Then, as shown in Figure 16.3, we decompose the
context space of C p into three layers that we define below.
Intra layer. Clusters within this layer reduce CR as they should not lead to a
quick drop in VI .Forthis,theyhavetobe close enough to C p , therefore, likely to
be merged with C p in next iterations without causing a significant degradation
(comparing to the previous mergings) in the global intra-cluster compactness. As
a matter of fact, the clusters are getting larger over mergings, and thus the intra-
cluster is continuously deteriorating. At a level k where FD did not occurred
yet, we suppose that all the previous mergings that caused degradations in the
intra-cluster are acceptable. This layer is delimited by the thresholds t 0=0
and t 1= δ ( C p ). We define δ ( C p ) as the radius of the new cluster candidate C p
augmented by the standard deviation of radius values obtained following the
previous mergings. A radius is the maximum distance between the centroid of
C p and an element within C p .
δ ( C p )= radius ( C p )+ Std ( radius ( C n ...C k− 1 ))
5 We set M =10inourexperiments.
 
Search WWH ::




Custom Search