Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Note that in case of a probabilistic classifier, the crisp classification

y k ( x ) is usually obtained as follows:

P M k ( y = c i |

y k ( x ) = arg max

c i ∈dom ( y )

x ) ,

(9.3)

P M k ( y = c

where M k denotes classifier k and

|

x ) denotes the probability of

y obtaining the value c given an instance x .

9.3.1.2 Performance Weighting

The weight of each classifier can be set proportional to its accuracy

performance on a validation set [Opitz and Shavlik (1996)]:

(1

−

E i )

α i =

,

(9.4)

T

(1

−

E j )

j =1

where E i is a normalization factor which is based on the performance

evaluation of classifier i on a validation set.

9.3.1.3 Distribution Summation

The idea of the distribution summation combining method is to sum up

the conditional probability vector obtained from each classifier [ Clark and

Boswell (1991) ] . The selected class is chosen according to the highest value

in the total vector. Mathematically, it can be written as:

P M k ( y = c i |

Class ( x ) = argmax

c i ∈dom ( y )

x ) .

(9.5)

k

9.3.1.4 Bayesian Combination

In the Bayesian combination method the weight associated with each

classifier is the posterior probability of the classifier given the training set

[Buntine (1990)].

P M k ( y = c i |

Class ( x ) = argmax

c i ∈dom ( y )

P ( M k |

S )

·

x ) ,

(9.6)

k

where P ( M k |

S ) denotes the probability that the classifier M k is correct

given the training set S . The estimation of P ( M k |

S ) depends on the

classifier's representation. To estimate this value for decision trees the

reader is referred to [ Buntine (1990) ] .

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home