Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Tree design approaches are aimed at finding “optimal” solutions: minimum

sized trees with high classification accuracy [12, 130, 177]. The greedy ap-

proach is followed because a search on the whole set of possible trees for a

given problem is almost always impractical (finding optimal binary trees is

NP-complete, [108]). There is, however, a shortcoming with the greedy de-

signs [130]; whereas node performance is a linear function of the class recogni-

tion rates, the overall tree performance is usually highly nonlinear. Since the

greedy design is not guaranteed to be “optimal” in any sense, the accuracy, ef-

ficiency and generalization capability of the final solution relies on using some

sort of tree pruning — i.e., removing subtrees with no statistically significant

contribution to the final performance —, remedying the usual over-fitting of

the final solution. Even with pruning, final tree solutions are often found to

be more complex than they should [116].

The MEE trees we are about to describe are constructed relying on the

(discrete) Shannon MEE data splitter, studied in Chap. 4, for the search of

best splits. Compared to classic data splitting rules (presented in Sect. 4.1.4)

the MEE approach has an important advantage: one is able to devise a stop-

ping rule based on an implict distribution overlapping measure, such as the

interval-end hit rate (Sect. 4.1.3.2). In other words, by using MEE splitters

one is implicitly contributing to a better generalization, making the tree de-

sign less dependent on pruning.

6.6.1 The MEE Tree Algorithm

The greedy algorithm for growing a MEE tree has the following main steps

[152]:

1. Starting at the root node (where the whole training set is used), at each

tree node one has an n

( n cases, c classes) matrix T , filled with zeros and ones. A univariate split

z (with parameter Δ or B ) minimizing EE is searched for in the d

d ( n cases, d features) matrix X and an n

c -

dimensional space.

2. For that purpose, the error rates P 10 = n 10 /n , P 01 = n 01 /n ( n tt :number

of class t cases classified as t ) are computed for each candidate class label

is computationally easier to use.)

3. Theruleatnode u j — therefore, the triple

∈{

0 , 1

}

. ( T =

{

0 , 1

}

—cor-

responding to the smallest empirical error entropy is assigned to the node

and if a stopping criterion is satisfied the node becomes a leaf. Otherwise,

theleftandrightnode( u jl ,u jr ) sets are generated and steps 1 and 2

iterated.

{

x ·j ,t ·k ,Δ j or B j }

The error rates at step 2 are computed for distinct values of every feature

x ·j ,j =1 ,...,d serving as candidate split points, Δ j for numerical-type rules

(often the middle points of the original training set values are used), and B j for

combinations of categorical values in the case of categorical variables. A tree

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home