Graphics Reference
In-Depth Information
Table 8.6 IS and data complexity
Description
Reference
Data characterization for effective edition and condensation schemes
[ 119 ]
Data characterization for effective PS
[ 64 ]
Usage of PS for enhance the computation of data complexity measures
[ 96 ]
Data characterization for effective under-sampling and over-sampling in
imbalanced problems
[ 115 ]
Meta-learning framework for IS
[ 103 ]
Prediction of noise filtering efficacy with data complexity measures for KNN
[ 137 ]
conditions were discussed in Chap. 2 of this topic. The data sets used are summarized
in Table 8.7 .
The data sets considered are partitioned using the 10-FCV procedure. The para-
meters of the PS algorithms are those recommended by their respective authors. We
assume that the choice of the values of parameters is optimally chosen by their own
authors. Nevertheless, in the PS methods that require the specification of the number
of neighbors as a parameter, its value coincides with the k value of the KNN rule
afterwards. But all edition methods consider a minimum of 3 nearest neighbors to
operate (as recommended in [ 165 ]), although they were applied to a 1NN classifier.
The Euclidean distance is chosen as the distance metric because it is well-known and
the most used for KNN. All probabilistic methods (including incremental methods
which depend on the order of instance presentation) are run three times and the final
results obtained correspond to the average performance values of these runs.
Thus, the empirical study involves 42 PS methods from those listed in Table 8.1 .
We want to outline that the implementations are only based on the descriptions and
specifications given by the respective authors in their papers. No advanced data struc-
tures and enhancements for improving the efficiency of PS methods have been carried
out. All methods (including the slowest ones) are collected in KEEL software [ 3 ].
8.6.1 Analysis and Empirical Results on Small Size Data Sets
Table 8.8 presents the average results obtained by the PS methods over the 39 small
size data sets. Red
.
denotes reduction rate achieved, tst Acc
.
and tst Kap
.
denote the
accuracy and kappa obtained in test data, respectively; Acc
.
correspond to the product of accuracy/kappa and reduction rate, which is an estimator
of how good a PS method is considering a tradeoff of reduction and success rate of
classification. Finally, Time denotes the average time elapsed in seconds to complete
a run of a PS method. 1 In the case of 1NN, the time required is not displayed due to the
fact that no PS stage is run before. For each type of result, the algorithms are ordered
from the best to the worst. Algorithms highlighted in bold are those which obtain
.
Red
.
and Kap
.
Red
1
The machine used was an Intel Core i7 CPU 920 at 2.67GHz with 4GB of RAM.
 
 
Search WWH ::




Custom Search