Graphics Reference
In-Depth Information
where max A and min A are the original maximum and minimum attribute values
respectively.
In the literature “normalization” usually refers to a particular case of the min-max
normalization in which the final interval is
[
0
,
1
]
, that is, ne
w
min A
=
0 and
ne
is also typical when normalizing the data.
This type of normalization is very common in those data sets being prepared
to be used with learning methods based on distances. Using a normalization to re-
scale all the data to the same range of values will avoid those attributes with a large
max A
w
max A =
1. The interval
[−
1
,
1
]
min A difference dominating over the other ones in the distance calculation,
misleading the learning process by giving more importance to the former attributes.
This normalization is also known for speeding up the learning process in ANNs,
helping the weights to converge faster.
An alternative, but equivalent, formulation for the min-max normalization is
obtained by using a base value ne
min A and the desired new range R in which the
values will be mapped after the transformation. Some well-known software packages
such as SAS or Weka [ 14 ] use this type of formulation for the min-max transforma-
tion:
w
R
min A
max A
v
v =
ne
w
min A +
.
(3.9)
min A
3.4.2 Z-score Normalization
In some cases, the min-max normalization is not useful or cannot be applied. When
the minimum or maximum values of attribute A are not known, the min-max normal-
ization is infeasible. Even when the minimum and maximum values are available,
the presence of outliers can bias the min-max normalization by grouping the values
and li m iting the digital precision available to represent the values.
If A is the mean of the values of attribute A and
σ A is the standard deviation,
original value v of A is normalized to v using
v
A
v =
.
(3.10)
σ A
By applying this transformation the attribute values now present a mean equal to 0
and a standard deviation of 1.
If the mean and standard deviation associated to the probability distribution are
not available, it is usual to use instead the sample mean and standard deviation:
n
1
n
A
=
v i ,
(3.11)
i
=
1
and
 
 
Search WWH ::




Custom Search