Graphics Reference
In-Depth Information
pursued is to study the effect of scaling up the data in PS methods. Table 8.10 shows
the average results obtained in the distinct performance measures considered (it
follows the same format as Table 8.8 ) and Table 8.11 summarizes the Wilcoxon test
results over medium data sets.
We can analyze several details from the results collected in Tables. 8.10 and 8.11 :
Five techniques outperform 1NN in terms of accuracy/kappa over medium data
sets: RMHC, SSMA, HMNEI, MoCS and RNGE. Two of them are edition schemes
(MoCS and RNGE) and the rest are hybrid schemes. Again, no condensation
method is more accurate than 1NN.
Some methods present clear differences when dealing with larger data sets. This is
the case with AllKNN, MENN and CHC. The first two, tend to try new reduction
passes in the edition process, which is against the interests of accuracy and kappa,
and in medium size problems this fact is more noticeable. Furthermore, CHC loses
the balance between reduction and accuracy when data size increases, due to the
fact that the reduction objective becomes easier.
There are some techniques whose run could be prohibitive when the data scales
up. This is the case for RNN, RMHC, CHC and SSMA.
The best methods in terms of accuracy or kappa are RNGE and HMNEI.
The best methods considering the tradeoff reduction-accuracy/kappa are RMHC,
RNN and SSMA.
8.6.3 Global View of the Obtained Results
Assuming the results obtained, several PS methods could be emphasized according
to the accuracy/kappa obtained (RMHC, SSMA, HMNEI, RNGE), the reduction
rate achieved (SSMA, RNN, CCIS) and computational cost required (POP, FCNN).
However, we want to remark that the choice of a certain method depends on various
factors and the results are offered here with the intention of being useful in making this
decision. For example, an edition scheme will usually outperform the standard kNN
classifier in the presence of noise, but few instances will be removed. This fact could
determine whether the method is suitable or not to be applied over larger data sets,
taking into account the expected size of the resulting subset. We have seen that the
PS methods which allow high reduction rates while preserving accuracy are usually
the slowest ones (hybrid mixed approaches such as SSMA) and they may require
an advanced mechanism to be applied over large size data sets or they may even be
useless under these circumstances. Fast methods that achieve high reduction rates
are the condensation approaches, but we have seen that they are not able to improve
kNN in terms of accuracy. In short, each method has advantages and disadvantages
and the results offered in this section allow an informed decision to be made within
each category.
In short, and focusing on the objectives usually considered in the use of PS algo-
rithms, we can suggest the following, to choose the proper PS algorithm:
 
Search WWH ::




Custom Search