Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

leaf is reached whenever a lower bound on the number of instances is reached or

an interval-end hit occurs, signaling a large distribution overlap (Sect. 4.1.3.2).

Instead of interval-end hits, other distribution overlapping criteria could be en-

visaged; for instance, one could attempt to detect a concave shape of EE (exem-

plified by Fig. 4.8b) as done in [151]. Final class label assignment of the leaves

is made by the usual process of majority voting. The algorithm has also to take

care of the following three issues specific of these trees.

6.6.1.1

Position of the Candidate Class

For numerical features we have seen in Sect. 4.1.4.2 that EE depends on the

position of the class (i.e., on whether we decide ω k for u jl or u jr ). Thus,

error rates must be computed for two class configurations: the candidate

class ( t =1)corresponds to the x ij >Δ j rule; the candidate class ( t =1)

corresponds to the x ij ≤

Δ j rule. Note that the candidate class position

achieving MEE is not known a priori .

It is, however, a trivial task to compute EE for one of the configurations,

when the error rates have been computed for the other. Let us assume that

the computation for the x>Δ configuration has been carried out for a nodal

set with n instances. In pseudo code: rule

←

( x>delta ); n 10 ←

sum (not

rule and t ); n 01 ←

sum ( rule and not t ). Using these n 's one then computes

their values for the other configuration simply as: n 1 ←

sum ( t ); n 10 ←

n 1 −

n 10 ; n 01 ←

n 01 .

The class position issue does not apply to categorical variables.

−

n 1 −

6.6.1.2

Handling Categorical Variables

MEE trees handle subsets of categorical features, of a set B =

in a simple way: the full computation of the MEE split point has only to

be carried out for the m singleton categories (instead of having to perform

2 m computations). This represents a considerable time saving. We now prove

this. Denoting a singleton category set

{

b 1 ,

···

,b m }

{

}

by b and its complement, B

−{

}

by b , let us assume that we have computed

P 10 = P (1 , b ) ,

P 01 = P (0 ,b ) ,

(6.56)

where P (1 , b ) and P (0 ,b ) are short notation for P ( ω 1 ,x

∈{

}

) and P ( ω 0 ,x

∈

. = x

{

}

), respectively (with ω 0 ≡

ω 1 ). The decision rule here is r

∈{

}

We have already seen in formula (4.46) that in order to compute EE we

only need the P 10 and P 01 values. These can also be used with class priors

p = P ( ω 1 ) and q =1

−

p to compute:

P ( b

1) = P 10 /p ;

P ( b

1) = 1

−

P 10 /p ;

P ( b

0) = P 01 /q .

(6.57)

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home