Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Assuming that the apriori probability for a i to be relevant is equal to that

of not being relevant:

P ( a i /

∈

B j ) >

P ( a i ∈

B j ) .

(13.17)

j =1

Using the complete probability theorem:

P ( a i /

∈

B j ) >

−

P ( a i /

∈

B j )) .

(13.18)

j =1

Because we are using non-ranker feature selectors the above probability is

estimated using:

P ( a/

∈

B j )

if a i ∈

B j

P ( a i

∈

B j )

≈

(13.19)

P ( a/

∈

B j )

if a i /

∈

B j

Note that P ( a/

B j ) does not refer to a specific attribute, but to the

general bias of the feature selector j . In order to estimate the remaining

probabilities, we are adding to the dataset a set of φ contrast attributes

that are known to be truly irrelevant and analyzing the number of artificial

features φ j included in the subset B j obtained by the feature selector j :

∈

B )= φ j

φ j

φ .

P ( a

∈

B j |

∈

φ ;

P ( a/

∈

B j |

∈

B )=1

−

(13.20)

The artificial contrast variables are obtained by randomly permuting

the values of the original n attributes across m instances. Generating

just random attributes from some simple distribution, such as Normal

Distribution, is not sucient, because the values of original attributes may

exhibit some special structure. Using Bayes theorem:

P ( a∈ B |a ∈ B j )= P ( a/

∈

B ) P ( a

∈

B j |

∈

B )

P ( a

∈

B j )

P ( a/

∈

B )

φ j

(13.21)

P ( a

∈

B j )

B j )= P ( a∈ B ) P ( a∈ B j |

∈

B )

P ( a/

∈

P ( a/

∈

B j )

P ( a/

∈

B )

φ j

−

(13.22)

−

P ( a

∈

B j )

B j )= |B j |

n + φ

where P ( a

∈

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home