Graphics Reference
In-Depth Information
Whereas the former can be used to simulate a NCAR noise model, the latter is
useful to produce a particular NAR noise model.
2. Attribute noise can proceed from several sources, such as transmission con-
straints, faults in sensor devices, irregularities in sampling and transcription errors
[ 85 ]. The erroneous attribute values can be totally unpredictable, i.e., random,
or imply a low variation with respect to the correct value. We use the uniform
attribute noise scheme [ 100 , 104 ] and the Gaussian attribute noise scheme in
order to simulate each one of the possibilities, respectively. We introduce attribute
noise in accordance with the hypothesis that interactions between attributes are
weak [ 100 ]; as a consequence, the noise introduced into each attribute has a low
correlation with the noise introduced into the rest.
Robustness is the capability of an algorithm to build models that are insensitive to
data corruptions and suffer less from the impact of noise [ 39 ]. Thus, a classification
algorithm is said to be more robust than another if the former builds classifiers which
are less influenced by noise than the latter, i.e., more robust. In order to analyze the
degree of robustness of the classifiers in the presence of noise, we will compare the
performance of the classifiers learned with the original (without induced noise) data
set with the performance of the classifiers learned using the noisy data set. Therefore,
those classifiers learned from noisy data sets being more similar (in terms of results)
to the noise free classifiers will be the most robust ones.
5.3 Noise Filtering at Data Level
Noise filters are preprocessing mechanisms to detect and eliminate noisy instances in
the training set. The result of noise elimination in preprocessing is a reduced training
set which is used as an input to a classification algorithm. The separation of noise
detection and learning has the advantage that noisy instances do not influence the
classifier building design [ 24 ].
Noise filters are generally oriented to detect and eliminate instances with class
noise from the training data. Elimination of such instances has been shown to be
advantageous [ 23 ]. However, the elimination of instances with attribute noise seems
counterproductive [ 74 , 100 ] since instances with attribute noise still contain valuable
information in other attributes which can help to build the classifier. It is also hard
to distinguish between noisy examples and true exceptions, and henceforth many
techniques have been proposed to deal with noisy data sets with different degrees of
success.
We will consider three noise filters designed to deal with mislabeled instances
as they are the most common and the most recent: the Ensemble Filter [ 11 ], the
Cross-Validated Committees Filter [ 89 ] and the Iterative-Partitioning Filter [ 48 ]. It
should be noted that these three methods are ensemble-based and vote-based filters.
A motivation for using ensembles for filtering is pointed out in [ 11 ]: when it is
assumed that some instances in the data have been mislabeled and that the label errors
 
Search WWH ::




Custom Search