Graphics Reference
In-Depth Information
from real valued ones with respect to nominal attributes. However we must point out
that in attribute noise the probability dependencies are not the only important aspect
to be considered. The probability distribution of the noise is also fundamental.
For numerical data, the noisy datum
x i may be a slight variation of the true value x i
or a completely random value. The density function of the noise values is very rarely
known. Simple examples of the first type of noise would be perturbations caused by a
normal distribution with the mean centered in the true value and with a fixed variance.
The second type of noise is usually estimated by assigning an uniform probability
to all the possible values of the input feature's range. This procedure is also typical
with nominal data, where no preference of one value is taken. Again note that the
distribution of the noise is not the same as the probability of its appearance discussed
above: first the noise must be introduced with a certain probability (following the
NCAR, NAR or NNAR models) and then the noise value is stated or analyzed to
follow the aforementioned density functions.
ˆ
5.2.2 Simulating the Noise of Real-World Data Sets
Checking the effect of noisy data on the performance of classifier learning algorithms
is necessary to improve their reliability and hasmotivated the study of how to generate
and introduce noise into the data. Noise generation can be characterized by threemain
characteristics [ 100 ]:
1. The place where the noise is introduced Noise may affect the input attributes
or the output class, impairing the learning process and the resulting model.
2. The noise distribution The way in which the noise is present can be, for example,
uniform [ 84 , 104 ] or Gaussian [ 100 , 102 ].
3. The magnitude of generated noise values The extent to which the noise affects
the data set can be relative to each data value of each attribute, or relative to the
minimum, maximum and standard deviation for each attribute [ 100 , 102 , 104 ].
In contrast to other studies in the literature, this topic aims to clearly explain
how noise is defined and generated, and also to properly justify the choice of the
noise introduction schemes. Furthermore, the noise generation software has been
incorporated into the KEEL tool (see Chap. 10 ) for its free usage. The two types of
noise considered in this work, class and attribute noise, have been modeled using
four different noise schemes; in such a way that, the presence of these types of noise
will allow one to simulate the behavior of the classifiers in these two scenarios:
1. Class noise usually occurs on the boundaries of the classes, where the examples
may have similar characteristics—although it can occur in any other area of
the domain. In this topic, class noise is introduced using an uniform class noise
scheme [ 84 ] (randomly corrupting the class labels of the examples) and a pairwise
class noise scheme [ 100 , 102 ] (labeling examples of the majority class with the
second majority class). Considering these two schemes, noise affecting any class
label and noise affecting only the two majority classes is simulated respectively.
 
Search WWH ::




Custom Search