Graphics Reference
In-Depth Information
Euclidean distance), and uses the class labels of such neighbors in order to classify
the considered instance. If the instance is not correctly classified, then the variable
noise is increased by one unit. Therefore, the final noise ratio will be
noise
#instances in the data set
Wilson's Noise
=
After imputing a data set with different imputation methods, we can measure how
disturbing the imputationmethod is for the classification task. Thus by usingWilson's
noise ratio we can observe which imputation methods reduce the impact of the MVs
as a noise, and which methods produce noise when imputing.
Another approach is to use the MI (MI) which is considered to be a good indicator
of relevance between two random variables [ 18 ]. Recently, the use of the MI measure
in FS has become well-known and seen to be successful [ 51 , 52 , 66 ]. The use of
the MuI measure for continuous attributes has been tackled by [ 51 ], allowing us to
compute the Mui measure not only in nominal-valued data sets.
In our approach, we calculate the Mui between each input attribute and the class
attribute, obtaining a set of values, one for each input attribute. In the next step we
compute the ratio between each one of these values, considering the imputation of
the data set with one imputation method in respect to the not imputed data set. The
average of these ratios will show us if the imputation of the data set produces a gain
in information:
x i X
Mui α ( x i ) +
1
Mui ( x i ) + 1
Avg. Mui Ratio
=
|
X
|
where X is the set of input attributes, Mui α (
)
i
represents the Mui value of the i th
(
)
attribute in the imputed data set and Mui
is the Mui value of the i th input attribute
in the not imputed data set. We have also applied the Laplace correction, summing
1 to both numerator and denominator, as an Mui value of zero is possible for some
input attributes.
The calculation of Mui
i
depends on the type of attribute x i . If the attribute x i
is nominal, the Mui between x i and the class label Y is computed as follows:
(
x i )
p
(
z
,
y
)
Mui nominal (
x i ) =
I
(
x i ;
Y
) =
p
(
z
,
y
)
log 2
) .
p
(
z
)
p
(
y
z
x i
y
Y
On the other hand, if the attribute x i is numeric, we have used the Parzen window
density estimate as shown in [ 51 ] considering a Gaussian window function:
Mui numeric (
x i ) =
I
(
x i ;
Y
) =
H
(
Y
)
H
(
C
|
X
) ;
 
 
Search WWH ::




Custom Search