MEE with Discrete Errors - Minimum Error Entropy Classification - page 109

Information Technology Reference

In-Depth Information

SEE is not a node impurity function. Whereas a node impurity function

depends solely on the P ( ω i |

u ), SEE depends on the “error” of a specific split

involving parent and children nodes. Moreover, whereas node impurity func-

tions have a c -dimensional support for any arbitrary c ,SEEalwayshasa

3-point support.

Binary splitting criteria are based on partitions of feature domains into

two sub-domains. The classic binary criterion is the “towing” splitting rule.

Corresponds to maximizing TWO ( u )= P ( u r |

u )[ ω i ∈Ω |

u ) P ( u l |

P ( ω i |

u r )

−

] 2 , measuring how far apart are the numbers of cases of each class

assigned to each of the two subdomains, u r and u l .

P ( ω i |

u l )

|

4.1.4.2

Comparing IG, TWO, PE, and SEE

The comparative study of the four splitting criteria requires a common frame-

work expressing them all uniformly. We start by expressing all criteria in terms

of a minimization problem (one could of course express them all as a maximiza-

tion problem). For the impurity criterion IG ( u ) (and GI ( u )),since φ ( u ) doesn't

depend on the rule, we minimize φ z ( u ) instead of maximizing Δ z φ ( u );forthe

towing criterion we minimize max( TWO ( u ))

TWO ( u ). For simplicity reasons

we keep the names IG ( u ) and TWO ( u ) for the min-expressed rules.

−

n 1 ,n 0

Children

ω 1

n 1 r

n 1 l

n 1

a> Δ?

No

Yes

ω 0

n 0 r

n 0 l

n 0

n 1 l ,n 0 l

n 1 r ,n 0 r

n r

n l

Fig. 4.11

Two-class framework.

ω 1 , and the splitting of a node

with n 1 instances from class 1 and n 0 from class 0 as shown in Fig. 4.11.

At the parent node, with n = n 1 + n 0 instances, we have: P ( ω 1 )= n 1 /n =

p ; P ( ω 0 )= n 0 /n =1

Next, we consider two classes, ω 1 and ω 0 ≡

p = q .Moreover,foranyprior p and whatever

criterion being used, we consider that a certain percentage α of ω 1 instances

are assigned to the right node and a certain percentage β of ω 0 instances are

assigned to the left node: n 1 r = αn 1 ; n 0 l = βn 0 . In other words, assuming

that the labels assigned to the right and left nodes are respectively ω 1 and

ω 0 , α and β represent the percentages of correctly classified instances. With

these provisions we achieve what we were looking for: all splitting rules are

expressed in terms of the minimization of a function depending on the same

parameters, p , α and β (the least possible number of parameters), as follows:

−

Next Page

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home