MEE with Discrete Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

node u j , represents a binary test z j as

Δ, z j ( x i )= ω k ; ω k otherwise}

for numerical inputs (i.e., real-valued inputs) or as

{

x ij ≤

B, z j ( x i )= ω k ; ω k ,

otherwise} for categorical inputs; Δ and B are, respectively, a numerical

threshold and a set of categories. The application of the binary test at a tree

node u sends the inputted data either to a left node u l —if x ij ≤

{

x ij ∈

Δ or

x ij ∈

B . The left and right

nodes, u l and u r , are called children nodes of u (the parent node). Thus, the

application of successive binary tests sends a data instance traveling down

the tree until reaching some tree leaf, where a final class label is assigned.

Details on the construction of binary trees can be found in [33, 188]. In

the following we describe three classic univariate splits, often used in the

construction of binary decision trees, and compare them with SEE splits [152].

B —; or to a right node u r —if x ij >Δ or x ij ∈

4.1.4.1

Classic Splitting Criteria

The many univariate splits (also named splitting criteria or rules) proposed

until today fall into two main categories: impurity-based criteria and binary

criteria [188, 147]. All of them are defined in terms of the class-conditional

PMFs, P ( ω i |

u ) at a node u . In contrast, MEE is not defined in terms of the

P ( ω i |

u );MEEis post-rule defined, in terms of the P ( ω i |

r ),where r denotes

a splitting rule.

An impurity function of a node u , φ ( u )

u )),isa

symmetric function measuring the degree of “disorder” of its PMF in such a

way that it achieves its maximum at the uniform distribution, and the min-

imum at a Dirac- delta distribution. Classic impurity functions are the Gini

≡

φ ( P ( ω 1 |

u ) ,...,P ( ω c |

diversity index, GDI ( u )= j =1 k =1 ,k = j P ( ω j |

u ) and, of course,

the (discrete) Shannon entropy (also called “information” in decision tree

jargon) H ( u )=

u ) P ( ω k |

− k =1 P ( ω k |

u ).

Consider a node u , candidate to splitting into right and left nodes, u r and

u l . The average impurity is φ z ( u )= P ( u r |

u )ln P ( ω k |

u, z ) φ ( u r )+ P ( u l |

u, z ) φ ( u l ).One

is then interested in maximizing Δ z φ ( u )= φ ( u )

φ z ( u ) at each node. When

using GDI ( u ) and H ( u ) we obtain the Gini index (GI) and information gain

(IG) split criteria, respectively. Since GDI ( u ) is the first order approximation

of H ( u ), both GI and IG behave similarly, allowing us to solely focus on IG

in the sequel [35, 178].

The probability of error at u , PE ( u )=1

−

u ),isalsoanim-

purity function, since it enjoys all the above properties. However, as shown

in [33], for tree design PE ( u ) has two serious drawbacks: it may lead to

ω ( u )= ω ( u r )= ω ( u l ) for all candidate splits and in such cases no best split

can be found; it may lead to reject solutions with purer children nodes in

detriment of others with highly impure nodes, the reason for this shortcom-

ing being the non-strict concave behavior of PE ( · ) ( GDI and H are strictly

concave functions).

−

max j P ( ω j |

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home