MEE with Continuous Errors - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

out as follows: a) generate a large number of samples for each class-conditional

input PDF; b) compute the output f E|t ( e ) PDFs with the KDE method; c)

compute H S and H R 2 using formulas (C.3) and (C.5) (entropy partitioning)

and performing numerical integration. Although laborious, this procedure

is able to produce accurate estimates of the theoretical EEs if one uses a

“large number of samples” in step (a), that is a value of n in formula (3.2)

guaranteeing a very low integrated mean square error (say, IMSE < 0 . 01)

of f E|t ( e ) when computed with the optimal h

≡

h ( n ) (see Appendix E). In

these conditions f E|t ( e ) is very close to f E|t ( e ).

The main differences between empirical and theoretical MEE are as follows:

•

Whereas theoretical MEE implies the separate evaluation of f E|t ( e ),the

empirical MEE relies on the estimate of the whole f E ( e )

f n ( e ) based on

≡

the n -sized dataset.

•

One cannot apply iterative optimization algorithms to theoretical EEs (at

each training step the f E|t ( e ) are not easily computable); one may, how-

ever, compute the theoretical EE in a neighborhood of a parameter vector,

as we will do later.

•

Whereas the kernel smoothing effect (see Appendix E) is neglectable when

using the optimal h ( n ) in the computation of theoretical EEs, its influence

will be of importance, as we shall see, in empirical MEE.

We saw in Sect. 2.2.1 that the min P e =0case for a classifier with interval

codomain corresponds to an error PDF with a null area in a subset of the

codomain. For instance, for the [

−

1 , 1] codomain and the usual thresholding

−

function assigning the

1 label to negative outputs and the 1 label to positive

outputs, the classifier has P e =0if the error variable is zero in [

∪

[1 , 2]. In this case the task of the training algorithm is to squeeze f E ( e ) (or

more rigorously,

−

2 ,

−

f n ( e ) for an arbitrarily large n ) driving it inside [

1 , 1].

Classifiers trained to reach the minimum error entropy should ideally, for

infinitely separated classes, achieve more than that: f E ( e ) should be driven

towards a single Dirac- δ at the origin (see 2.3.1). For other codomain intervals

the same result for MEE trained classifiers should ideally be obtained. One

question that arises in practical terms is, when we have all n error samples

with equal value, e 1 = ... = e n , and use the empirical EEs (3.4) and (3.5)

instead of the theoretical EEs, are they still a minimum? In other words,

do empirical EEs preserve the minimum property of theoretical EEs? The

answer is armative: the condition e 1 = ... = e n =0is a minimum, and in

fact a global minimum, of the empirical Shannon EE [212].

−

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home