Graphics Reference
In-Depth Information
and if the data used is dirty the results will be faulty. However, the detection of the
noisy data in the data set is not a trivial task [ 16 ] and a wrong detection will result
in damage to the correct data.
The way of handling MVs and noisy data is quite different:
MVs are treated by routines prior to the DM algorithm application. The instances
containingMVs can be ignored, or filled inmanually or with a constant. Elaborated
strategies that use estimations over the data are recommended in order to obtain
reliable and more general results. This task is deeper studied in Chap. 4 .
The presence of noise in data is often defined as a random error in a measured
variable, changing its value. Basic statistical and descriptive techniques as scatter
plots can be used to identify outliers. Multiple linear regression is considered to
estimate the tendency of the attribute values if they are numerical. However, the
most recommended approach in the literature is the noisy detection and treatment,
usually by filtering. Chapter 5 is completely devoted to noise identification and
filtering.
3.4 Data Normalization
The data collected in a data set may not be useful enough for a DM algorithm.
Sometimes the attributes selected are raw attributes that have a meaning in the
original domain from where they were obtained, or are designed to work with the
operational system in which they are being currently used. Usually these original
attributes are not good enough to obtain accurate predictive models. Therefore, it is
common to perform a series of manipulation steps to transform the original attributes
or to generate new attributes with better properties that will help the predictive power
of the model. The new attributes are usually named modeling variables or analytic
variables .
In this section we will focus on the transformations that do not generate new
attributes, but they transform the distribution of the original values into a new set of
values with the desired properties.
3.4.1 Min-Max Normalization
The min-max normalization aims to scale all the numerical values v of a numerical
attribute A to a specified range denoted by
. Thus a
transformed value is obtained by applying the following expression to v in order to
obtain the new value v :
[
ne
w
min A ,
ne
w
max A ]
min A
max A
v
v =
min A (
w
max A
w
min A ) +
w
min A ,
ne
ne
ne
(3.8)
 
Search WWH ::




Custom Search