Graphics Reference
In-Depth Information
Fig. 5.2 The three types of examples considered in this topic: safe examples (labeled as s ), bor-
derline examples (labeled as b ) and noisy examples (labeled as n ). The continuous line shows the
decision boundary between the two classes
5.2 Types of Noise Data: Class Noise and Attribute Noise
A large number of components determine the quality of a data set [ 90 ]. Among them,
the class labels and the attribute values directly influence the quality of a classification
data set. The quality of the class labels refers to whether the class of each example is
correctly assigned; otherwise, the quality of the attributes refers to their capability of
properly characterizing the examples for classification purposes—obviously, if noise
affects attribute values, this capability of characterization and therefore, the quality
of the attributes, is reduced. Based on these two information sources, two types of
noise can be distinguished in a given data set [ 12 , 96 ]:
1. Class noise (also referred as label noise ) It occurs when an example is incorrectly
labeled. Class noise can be attributed to several causes, such as subjectivity during
the labeling process, data entry errors, or inadequacy of the information used to
label each example. Two types of class noise can be distinguished:
Contradictory examples There are duplicate examples in the data set having
different class labels [ 31 ].
Misclassifications Examples are labeled with class labels different from their
true label [ 102 ].
2. Attribute noise It refers to corruptions in the values of one or more attributes.
Examples of attribute noise are: erroneous attribute values, missing or unknown
attribute values, and incomplete attributes or “do not care” values.
 
Search WWH ::




Custom Search