Information Technology Reference
In-Depth Information
16.3.2
Context-Aware Method
Context-Aware Clustering.
The notion of context was introduced to cluster
analysis in [17, 22]. Context was typically involved to provide a more reliable sim-
ilarity calculation between objects by taking into account their relative nearest
neighbor objects. Our goal by involving context in clustering is totally different;
actually, we aim to involve context in order to enhance the ability of validity
indices to be used as stopping criteria where a first drop (FD) in the quality
of a clustering solution can more relevantly indicate that the optimal solution
has been reached. Reconsidering Figure 16.1, the goal is to enhance the classic
index
curve with context-awareness, in order to obtain another temperate curve
(
index + context
curve) where the FD (at
d
) approaches as much as possible the
optimal solution (at
c
).
To achieve this goal, the idea is to provide clustering algorithms with a wider
view on the dataset partition, which will enable them to take decisions while
having in “mind” an “idea” on what could happen next if a specific action
is undertaken. As we are seeking the hierarchical agglomerative algorithm, the
method applies the following heuristic at each level
j
of the process:
1. Consider the
M
closest pairs of clusters;
2. estimate
VI
after trying to merge each of the
M
pairs;
3. among the mergings that improve
VI
(over the previous
VI
at
j
−
1), assess
the merging with the lowest
Context Risk
(
CR
);
4. if no merging improves
VI
, merge the pair that optimizes (maximizes/
minimizes)
VI
.
Consider the bi-dimensional dataset provided in Figure 16.2 where each point
represents a cluster centroid
3
. Suppose that at this stage we have two merging
options: (1) merging
C
i
and
C
j
into a new cluster
C
p
, and (2) merging
C
m
and
C
n
into a new cluster
C
q
. Suppose that
C
p
optimizes a validity index
VI
,
while
C
q
simply improves it. Before taking any definitive decision, the method
examines the context of
C
p
and
C
q
in terms of their
K
Nearest Neighbors
K
-NN
(i.e., surrounding clusters)
4
. If the context of
C
p
tells that merging
C
i
and
C
j
could lead to a global quality degradation in terms of
VI
in next iterations, the
method chooses to create rather
C
q
improving
VI
at a minimal context risk.
Applying this method surely implies a more temperate and a slower improve-
ment in
VI
(as shown at the curve
index + context
in Figure 16.1), but has the
advantage of continuously pushing, as much as possible, risky merging actions
entailing possible future degradations for later processing. We argue that taking
the “safest” action at each level leads an expected degradation to occur as late
as possible during the process. Thus, a first drop (FD) is likely to occur closer to
the optimal solution, which will offer the possibility to the algorithm to consider
more relevantly FD at (
d
+ 1) as a stopping criteria, and the solution provided
at
d
as the optimal clustering solution.
3
S
i
is the centroid of cluster
C
i
.
4
We set
K
=10inourexperiments.