Continuous Risk Functionals - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

Theorem 2.2. Given any loss function L ( e ) , which satisfies lim |e|→ + ∞ L ( e )

∞

, one has R L ( E )

∝{

H S ( E )+ D KL ( f E ( e )

q L ( e ))

}

,where q L ( e ) is a

PDF related to L ( e ) by q L ( e )=exp(

−

γ 0 −

γ 1 L ( e )) .

The proof of the theorem and the existence of q L ( e ) is demonstrated in the

cited work, which also provides the means of computing the γ 0 and γ 1 con-

stants. Note that all loss functions we have seen so far, with the exception

of the one corresponding to Rényi's quadratic entropy, satisfy the condition

lim |e|→ + ∞ L ( e )=+

. Theorem 2.2 also provides an interesting bound on

R L ( E ). Since the Kullback-Leibler divergence is always non-negative, we have

∞

H S ( E )+ D KL ( f E ( e )

q L ( e ))

≥

H S ( E ) ,

(2.41)

with equality iff f E ( e )= q L ( e ). Therefore, minimizing any risk functional

R L ( E ),with L ( e ) satisfying the above condition, is equivalent to minimizing

an upper bound of the error entropy H S ( E ). Moreover, Theorem 2.2 allows us

to interpret the minimization of any risk functional R L ( E ) as being driven by

two “forces”: one, D KL ( f E ( e )

q L ( e )), that attempts to shape the error PDF

in a way that reflects the loss function itself; the other, H S ( E ), providing the

decrease of the dispersion of the error, its uncertainty.

There is an abundant literature on information theoretic topics. For the

reader unfamiliar with this area an overview on definitions and properties

of entropies (Shannon, generalized Rényi, and others) can be found in the

following works: [131, 183, 168, 48, 184, 164, 96, 62]. Appendix B presents a

short survey of properties that are particularly important throughout the

topic.

2.3.3 MEE Is Harder for Classification than for

Regression

An important result concerning the minimization of error entropy was shown

in [67] for a machine solving a regression task, approximating its output y to

some desired continuous function d ( x ). The authors showed that the MEE

approach corresponds to the minimum of the Kullback-Leibler divergence of

f X,Y (the joint PDF when the output is y w ( x )) with respect to d X,Y (the

joint PDF when the output is the desired d ( x )). Concretely, they showed that

d X,Y )=

f X,Y ( x, y )ln f X,Y ( x, y )

d X,Y ( x, y ) dxdy .

(2.42)

Their demonstration was, in fact, presented for the generalized family of Rényi

entropies, which includes the Shannon entropy H S ( E ) as a special asymptotic

case. Moreover, although not explicit in their demonstration, the above result

is only valid if the above integrals exist, which among other things require

min H S ( E )

⇒

min D KL ( f X,Y

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home