Graphics Reference
In-Depth Information
(a)
(b)
Fig. 3.1 Example of the histogram spreading made by a Box-Cox transformation: a before the
transformation and b after the transformation
3.5.7 Spreading the Histogram
Spreading the histogram is a special case of Box-Cox transformations. As Box-Cox
transforms the data to resemble a normal distribution, the histogram is thus spread
as shown in Fig. 3.1 .
When the user is not interested in converting the distribution to a normal one,
but just spreading it, we can use two special cases of Box-Cox transformations [ 30 ].
Using the logarithm (with an offset if necessary) can be used to spread the right side
of the histogram: y
. On the other hand, if we are interested in spreading
the left side of the histogram we can simply use the power transformation y
=
log
(
x
)
x g .
However, as [ 30 ] shows, the power transformation may not be as appropriate as
the Log transformation and it presents an important drawback: higher values of g
may help to spread the histogram but they will also cause problems with the digital
precision available.
=
3.5.8 Nominal to Binary Transformation
The presence of nominal attributes in the data set can be problematic, specially if
the DM algorithm used cannot correctly handle them. This is the case of SVMs and
ANNs. The first option is to transform the nominal variable to a numeric one, in
which each nominal value is encoded by an integer, typically starting from 0 or 1
onwards. Although simple, this approach has two big drawbacks that discourage it:
With this transformation we assume an ordering of the attribute values, as the
integer values are ranked. However the original nominal values did not present
any ranking among them.
The integer values can be used in operations as numbers, whereas the nominal
values cannot. This is even worse than the first point, as with this nominal to
integer transformation we are establishing unequal differences between pairs of
nominal values, which is not correct.
 
Search WWH ::




Custom Search