Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

The last definition refers to a simple majority voting, in which attribute

a i is included in the combined feature subset if it appears in at least half of

the base feature subsets B 1 ,...,B ω ,where ω is the number of base feature

subsets. Note that f c ( a i ,B 1 ,...,B ω ) counts the number of base feature

subsets in which a i is included.

Lemma 13.1. A majority combination of feature subsets obtained from

a given a set of independent and consistent feature selectors FS 1 ,...,FS ω

( where ω is the number of feature selectors ) converges to the optimal feature

subset when ω

→∞

Proof. For ensuring that for attributes for which a i ∈

B are actually

selected, we need to show that:

p f c ( a i ) > ω

2 =1 .

lim

ω→∞,p> 1/2

(13.5)

We denote by p j,i

> 1 the probability of FS j

to select a i .Wedenote

> 2

by p i

. Because the feature selectors are

independent, we can use approximation binomial distribution, i.e.:

=min( p j,i ). Note that p i

p i (1

p f c ( a i ) >

p i ) ω−k .

lim

ω→∞

≤

lim

ω→∞,p i > 1/2

−

(13.6)

k =0

Due to the fact that ω

→∞

we can use the central limit theorem in

which, µ = ωp i ,σ = ωp i (1

−

p i ):

p Z> √ ω ( 1 / 2 −

= p ( Z>

p i )

lim

ω→∞,p i > 1/2

p i (1

−∞

)=1 .

(13.7)

−

p i )

For ensuring that for attributes for which a i

∈

B are actually selected

we need to show that:

p f c ( a i ) <

=0 .

lim

ω→∞

(13.8)

We denote by q j,i < 1 / 2 the probability of FS j to select a i .Wedenote

by q i =max( q j,i ). Note that q i < 2

. Because the feature selectors are

independent, we can use approximation binomial distribution, i.e.:

q i (1

p f c ( a i ) <

q i ) ω−k .

≥

−

lim

ω→∞

lim

ω→∞,q i < 1/2

(13.9)

k =0

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home