Information Technology Reference
In-Depth Information
each VI , assess that with the context-aware method we can still have a compa-
rable and sometimes better clustering quality than the standard method without
involving any context-awareness. In average, using the method led the FScore at
the optimal value of VI to an improvement of 1.27% and 0.16% in DS2 and DS3,
and to a deterioration of 0.21% and 0.14% in DS1 and DS4 respectively. This can
be explained by the “safe” decisions taken at each step of the process. Although
not improving VI at the highest speed, these context-aware decisions, by foresee-
ing the upcoming mergings, provide better clustering possibilities in future itera-
tions, and thus competitive partitions quality at the end.
16.5.3
On the Quality of the Final Solutions
More informative than the quality of the optimal solutions, is the quality of the
final provided solutions obtained when stopping the process before FD .Thus,
these solutions, provided with/without using context-awareness, are also evalu-
ated in terms of FScore . In average, using the context-aware method contributed
to an FScore improvement of 63.14%, 30.16%, 10.04%, and 19.53% in DS1, DS2,
DS3, and DS4 respectively. We can notice that the largest improvements are
noticed in the document clustering datasets. This is not surprising given the
relatively poor representation of words patterns comparing to documents.
16.6
Conclusion and Future Trends
On the hand, we presented an experimental study that showed that indices
perform generally “well” at evaluating solutions, especially when dealing with
words. However, although they are supposed to be completely insensitive to the
number of clusters k , they have showed some rigidity to k , leading to erroneous
top-ranked solutions. In addition to that, we saw that these indices when involved
as criterion functions yield slightly better results to the case where indices were
simply used as external indicators.
On the other hand, we studied the feasibility of using relative indices as stop-
ping criteria in agglomerative clustering algorithms. Experiments performed in
two applications, document and word clustering, showed that indices used alone
are not effective for such purpose. Thus, we presented a method that aims to
smooth indices' plots by taking the “safest” decision at each level of a clustering
process. We demonstrated that the method could remarkably enhance the usage
of relative indices as stopping criteria.
An important drawback in most relative indices is their high computational
cost. Yet, their utilization seems crucial in view of a parameter-free clustering.
That is, an important trend is to develop methods that could accurately approx-
imate their values on reduced and representative subsets of data. Among the few
works that have been conducted in this direction, we can cite [11]. Such works,
if tied with ecient clustering methods (e.g., CLIQUE, PROCLUS, Bisective
k -means, frequent pattern-based methods), may enable an objective clustering
on large and high-dimensional datasets.
 
Search WWH ::




Custom Search