Graphics Reference
In-Depth Information
In this topic, class noise refers to misclassifications, whereas attribute noise refers
to erroneous attribute values, because they are the most common in real-world data
[ 100 ]. Furthermore, erroneous attribute values, unlike other types of attribute noise,
such as MVs (which are easily detectable), have received less attention in the litera-
ture.
Treating class and attribute noise as corruptions of the class labels and attribute
values, respectively, has been also considered in other works in the literature [ 69 ,
100 ]. For instance, in [ 100 ], the authors reached a series of interesting conclusions,
showing that attribute noise is more harmful than class noise or that eliminating
or correcting examples in data sets with class and attribute noise, respectively, may
improve classifier performance. They also showed that attribute noise ismore harmful
in those attributes highly correlated with the class labels. In [ 69 ], the authors checked
the robustness of methods from different paradigms, such as probabilistic classifiers,
decision trees, instance based learners or SVMs, studying the possible causes of their
behavior.
However, most of the works found in the literature are only focused on class
noise. In [ 9 ], the problem of multi-class classification in the presence of labeling
errors was studied. The authors proposed a generative multi-class classifier to learn
with labeling errors, which extends the multi-class quadratic normal discriminant
analysis by a model of the mislabeling process. They demonstrated the benefits
of this approach in terms of parameter recovery as well as improved classification
performance. In [ 32 ], the problems caused by labeling errors occurring far from
the decision boundaries in Multi-class Gaussian Process Classifiers were studied.
The authors proposed a Robust Multi-class Gaussian Process Classifier, introducing
binary latent variables that indicate when an example is mislabeled. Similarly, the
effect of mislabeled samples appearing in gene expression profiles was studied in
[ 98 ]. A detection method for these samples was proposed, which takes advantage of
the measuring effect of data perturbations based on the SVM regression model. They
also proposed three algorithms based on this index to detect mislabeled samples. An
important common characteristic of these works, also considered in this topic, is that
the suitability of the proposals was evaluated on both real-world and synthetic or
noisy-modified real-world data sets, where the noise could be somehow quantified.
In order to model class and attribute noise, we consider four different synthetic
noise schemes found in the literature, so that we can simulate the behavior of the
classifiers in the presence of noise as presented in the next section.
5.2.1 Noise Introduction Mechanisms
Traditionally the label noise introduction mechanism has not attracted as much atten-
tion in its consequences as it has in the knowledge extracted from it. However, as
the noise treatment is being embedded in the classifier design, the nature of noise
becomes more and more important. Recently, the authors in Frenay and Verley-
sen [ 19 ] have adopted the statistical analysis for the MVs introduction described
 
Search WWH ::




Custom Search