Graphics Reference
In-Depth Information
criminating, and thus more interesting, for C4.5 to discern when a noise filter will
behave well or badly. Based on these rankings it is easy to observe that F2, N2, F1
and F3 are the predominant measures in the order of choice. Please remember that
behind these acronyms, the data complexity measures aim to describe one particu-
lar source of difficulty for any classification problem. Following the order from the
most important of these four outstanding measures to the least, the volume of overlap
region (F2) is key to describe the effectiveness of a class noise filter. The less any
attribute is overlapped, the better the filter is able to decide if the instance is noisy.
It is complemented with the ratio of average intra/inter class distance as defined by
the nearest neighbor rule. When the examples sharing the same class are closer than
the examples of other classes the filtering is effective for 1-NN. This measure is
expected to change if another classifier is chosen to build the classification problem.
F1 and F3 are also measures of individual attribute overlapping as F2, but they are
less important in general.
If the discriminant abilities of these complexitymeasures are as good as their ranks
indicate, using only these few measures we can expect to obtain a better and more
concise description of what a easy-to-filter problem is. In order to avoid the study
of all the existing combinations of the five metrics, the following experimentation
is mainly focused on the measures F2, N2 and F3, the most discriminative ones
since the order results can be considered more important than the percentage results.
The incorporation of F1 into this set is also studied. The prediction capability of
the measure F2 alone, since is the most discriminative one, is also shown. All these
results are presented in Table 5.5 .
The use of the measure F2 alone to predict the noise filtering efficacy with good
performance can be discarded, since its results are not good enough compared with
the cases where more than one measure is considered. This fact reflects that the use
of single measures does not provide enough information to achieve a good filtering
efficacy prediction result. Therefore, it is necessary to combine several measures
which examine different aspects of the data. Adding the rest of selected measures
provides comparable results to those shown in Table 5.3 yet limits the complexity of
the rule set obtained.
The work carried out in this section is studied further in [ 77 ], showing how a
rule set obtained for one filter can be applied to other filters, how these rule sets are
validated with unseen data sets and even increasing the number of filters involved.
Table 5.5 Performance results of C4.5 predicting the noise filtering efficacy (measures used: F2,
N2, F3, and F1)
F2
F2-N2-F3-F1
F2-N2-F3
Noise Filter
Training
Test
Training
Test
Training
Test
CVCF
1.0000
0.5198
0.9983
0.7943
0.9977
0.8152
EF
1.0000
0.7579
0.9991
0.8101
0.9997
0.8421
IPF
1.0000
0.7393
0.9989
0.8119
0.9985
0.7725
Mean
1.0000
0.6723
0.9988
0.8054
0.9986
0.8099
 
Search WWH ::




Custom Search