Database Reference
In-Depth Information
Assuming that the
apriori
probability for
a
i
to be relevant is equal to that
of not being relevant:
ω
ω
P
(
a
i
/
∈
B
|
B
j
)
>
P
(
a
i
∈
B
|
B
j
)
.
(13.17)
j
=1
j
=1
Using the complete probability theorem:
ω
ω
P
(
a
i
/
∈
B
|
B
j
)
>
(1
−
P
(
a
i
/
∈
B
|
B
j
))
.
(13.18)
j
=1
j
=1
Because we are using non-ranker feature selectors the above probability is
estimated using:
P
(
a/
∈
B
|
a
∈
B
j
)
if a
i
∈
B
j
P
(
a
i
∈
/
B
|
B
j
)
≈
.
(13.19)
P
(
a/
∈
B
|
a/
∈
B
j
)
if a
i
/
∈
B
j
Note that
P
(
a/
B
j
) does not refer to a specific attribute, but to the
general bias of the feature selector
j
. In order to estimate the remaining
probabilities, we are adding to the dataset a set of
φ
contrast attributes
that are known to be truly irrelevant and analyzing the number of artificial
features
φ
j
included in the subset
B
j
obtained by the feature selector
j
:
∈
B
|
a
∈
B
)=
φ
j
φ
j
φ
.
P
(
a
∈
B
j
|
a/
∈
φ
;
P
(
a/
∈
B
j
|
a/
∈
B
)=1
−
(13.20)
The artificial contrast variables are obtained by randomly permuting
the values of the original
n
attributes across
m
instances. Generating
just random attributes from some simple distribution, such as Normal
Distribution, is not sucient, because the values of original attributes may
exhibit some special structure. Using Bayes theorem:
P
(
a∈ B |a ∈ B
j
)=
P
(
a/
∈
B
)
P
(
a
∈
B
j
|
a/
∈
B
)
P
(
a
∈
B
j
)
P
(
a/
∈
B
)
φ
j
φ
=
(13.21)
P
(
a
∈
B
j
)
B
j
)=
P
(
a∈ B
)
P
(
a∈ B
j
|
a/
∈
B
)
P
(
a/
∈
B
|
a/
∈
P
(
a/
∈
B
j
)
1
,
P
(
a/
∈
B
)
φ
j
φ
=
−
(13.22)
1
−
P
(
a
∈
B
j
)
B
j
)=
|B
j
|
n
+
φ
where
P
(
a
∈
Search WWH ::
Custom Search