Information Technology Reference
In-Depth Information
On indices rigidity over the number of clusters. In spite of their abil-
ity to reach “good” clustering solutions, relative indices provide relatively poor
solutions at their top-ranked solutions. In other words, while they could reach
high FScores at a specific k , they have hard-time detecting this optimal k ,lead-
ing unfortunately to poor solutions comparing to what they could have reached.
This can be explained by the somehow rigid trends over the number of clusters.
Even if these indices are supposed to be completely insensitive to k , depending
uniquely on the dataset, it is not always the case. This lead them to indicate op-
timal solutions at “wrong places”. The gaps between the FScores are dramatic;
for instance, while C 3 could lead the algorithm to an FScore of 0.60 in DS4,
it could only reach an FScore of 0.21 at its optimal value. The only exceptions
to this are the H 3and C 4 indices which seem to provide comparable FScores ,
since their high ability to detect the predefined optimal k .
The rigid trend over k , can be seen more clearly in Figures 16.7, 16.8. By fo-
cusing on the first colon “K @ Optimal index” , we can notice that most indices
are keeping similar relative trends over the optimal k along different datasets.
In our datasets, they generally show trends to large optimal k comparingtothe
“real” optimal k which is remarkably smaller ( ”K @ Optimal F ). These differ-
ences, between the predicted optimal k , and the “real” optimal k are naturally
behind the wide gaps in FScore that we have mentioned above. This explain also
why H 3and C 4 are exceptions to the other indices. In fact, their high ability
for reaching the optimal k is surely affecting their high ability for reaching the
optimal clustering solutions.
Are indices better used as external indicators or criterion functions?
As mentioned earlier, involving indices as criterion functions leads to much higher
complexity than by involving them as external indicators. In fact, it is natural to
believe that driving an algorithm by optimizing a validity index, would overcome
the approach that drives an algorithm basing on a “blind” similarity between
patterns (e.g., mean-linkage), without paying any consideration to the whole
clustering quality. However, surprisingly, our results showed that the difference
is not significant. For instance, consider the 'one-to-one' maximal reached rates
with each approach; the H3/mean-linkage criteria could lead to maximal FScore
of 0.695/0.671, 0.624/0.593, 0.245/0.240, and 0.204/0.192 respectively in DS1,
DS2, DS3, and DS4. Thus, slight improvements were noticed when involving
the H 3 index, but results are comparable, especially when involving the other
indices. Depending on each task requirements, the open question remains: Is
it worthwhile to considerably increase the algorithm complexity to reach only a
slight improvement in partitions quality? In most cases, the answer will be no.
However, for one seeking definitely the optimal partitions out of an algorithm,
a complexity reduction is highly recommended. This is broadly the purpose of
the method that we proposed in Section 16.3, which aims to reduce algorithms'
complexity by using indices as stopping criteria. This method is evaluated in the
next section.
 
Search WWH ::




Custom Search