Information Technology Reference
In-Depth Information
16.3.2
Context-Aware Method
Context-Aware Clustering. The notion of context was introduced to cluster
analysis in [17, 22]. Context was typically involved to provide a more reliable sim-
ilarity calculation between objects by taking into account their relative nearest
neighbor objects. Our goal by involving context in clustering is totally different;
actually, we aim to involve context in order to enhance the ability of validity
indices to be used as stopping criteria where a first drop (FD) in the quality
of a clustering solution can more relevantly indicate that the optimal solution
has been reached. Reconsidering Figure 16.1, the goal is to enhance the classic
index curve with context-awareness, in order to obtain another temperate curve
( index + context curve) where the FD (at d ) approaches as much as possible the
optimal solution (at c ).
To achieve this goal, the idea is to provide clustering algorithms with a wider
view on the dataset partition, which will enable them to take decisions while
having in “mind” an “idea” on what could happen next if a specific action
is undertaken. As we are seeking the hierarchical agglomerative algorithm, the
method applies the following heuristic at each level j of the process:
1. Consider the M closest pairs of clusters;
2. estimate VI after trying to merge each of the M pairs;
3. among the mergings that improve VI (over the previous VI at j
1), assess
the merging with the lowest Context Risk ( CR );
4. if no merging improves VI , merge the pair that optimizes (maximizes/
minimizes) VI .
Consider the bi-dimensional dataset provided in Figure 16.2 where each point
represents a cluster centroid 3 . Suppose that at this stage we have two merging
options: (1) merging C i and C j into a new cluster C p , and (2) merging C m
and C n into a new cluster C q . Suppose that C p optimizes a validity index VI ,
while C q simply improves it. Before taking any definitive decision, the method
examines the context of C p and C q in terms of their K Nearest Neighbors K -NN
(i.e., surrounding clusters) 4 . If the context of C p tells that merging C i and C j
could lead to a global quality degradation in terms of VI in next iterations, the
method chooses to create rather C q improving VI at a minimal context risk.
Applying this method surely implies a more temperate and a slower improve-
ment in VI (as shown at the curve index + context in Figure 16.1), but has the
advantage of continuously pushing, as much as possible, risky merging actions
entailing possible future degradations for later processing. We argue that taking
the “safest” action at each level leads an expected degradation to occur as late
as possible during the process. Thus, a first drop (FD) is likely to occur closer to
the optimal solution, which will offer the possibility to the algorithm to consider
more relevantly FD at ( d + 1) as a stopping criteria, and the solution provided
at d as the optimal clustering solution.
3 S i is the centroid of cluster C i .
4 We set K =10inourexperiments.
 
Search WWH ::




Custom Search