Information Technology Reference
In-Depth Information
16.5
Evaluating Our Context-Aware Method
In this section, we present an experimental study, which is an attempt to answer
the following question: How reliable can be the usage relative indices as stopping
criteria in agglomerative clustering? ”. Along those lines, we explore the added-
value of enhancing a clustering process with context-awareness in order to enable
validity indices usage as stopping criteria. We evaluate the proposed method on
the four benchmarks described in Sections 16.4.2 and 16.4.3, for document and
word clustering respectively. We excluded from the following experiments some
indices that showed to be inappropriate for the context-aware method because
they provide too unstable curves to be stabilized (e.g., Dunn , m - Dunn ). At last,
our experiments will be carried out on 5 indices, namely DB [7], C1, C2, C4
[27], and H3 .
Therefore, the experiments include 10 algorithms after having run the agglom-
erative algorithm 2 times for each of the 5 relative indices (with and without
context-awareness). Each solution provided at each level of the clustering pro-
cess is evaluated by means of the target relative index (predicted quality) and
the FScore (real quality).
As stressed earlier in this chapter, the goal is to approach, as much as possible,
the solution provided before FD to the optimal solution. The optimal solution
is defined as being the solution at k where a specific VI reached its maximum
or minimum, depending on whether we tend to maximize or minimize VI .
16.5.1
On Approaching the Optimal Clustering Solution
We first study to which extent the context-aware method allows FD to approach
the optimal clustering solution reached under a specific number of clusters k .
Therefore, we demonstrate in Figures 16.9 and 16.10 the complete agglomerative
clustering process ( k = n
k = 1) divided into three parts:
-
P1: This part goes from the initial set ( k = n ) to the last point before FD .
Thus, using a VI as a stopping criterion will lead the process to the last
point of P1.
-
P2: This part goes from FD to the optimal clustering solution. It represents
the part that must be processed but will not if VI is used as a stopping
criterion.
-
P3: This part goes from the optimal solution until the root cluster ( k =1),
which forms the unnecessary part that will be performed in vain if VI is not
used as a stopping criterion.
By observing Figures 16.9 and 16.10, we can quickly notice the added-value of
the context-aware method for both applications word and document clustering.
On the first hand, it avoids a clustering algorithm from processing all the P3
parts which is a great time waste. On the other hand, it contributes to reduce
P2, since in most cases, FD occurs remarkably closer to the optimal solution.
This will surely enable us to consider more relevantly a solution before FD as the
Search WWH ::




Custom Search