Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

6.3.4 MEE Algorithm

The minimum error entropy (MEE) will now be used instead of the mini-

mum mean square error (MMSE) as the optimization principle behind the

learning algorithm for this network. MEE training needs a batch mode al-

gorithm because the distribution of the errors for updating the weights has

to be estimated, so several of these errors are needed to obtain a good PDF

estimate.

As seen above, the error e j = t j −

y j represents the difference between

the target t j of the j neuron and its output y j . The MSE of the variable e j

will be replaced with its EE counterpart. First it is necessary to estimate the

PDF of the error. For this, the Parzen window estimator is used.

Using the empirical estimate of H 2 EE, as in expression (3.5), we minimize

the corresponding information potential, which (ignoring constant factors) is

written as

K e i −

e u

V R 2 =

(6.42)

i =1

u =1

The derivative of V R 2

w.r.t. the real weights is:

∂e i

∂w kj −

K e i −

∂V R 2

∂w kj

e u

∂e u

∂w kj

(6.43)

i =1

u =1

∂e i

∂w kj

∂y i

−

Since

is given by

∂w kj , expression (6.43) can be written as

K e i −

(( s ( z i )

∂V R 2

∂w kj

e u

s ( z i ))( s ( z i ) x j −

s ( z i ) x j )

−

i =1

u =1

( s ( z u )

s ( z u ))( s ( z u ) x j −

s ( z u ) x j )) .

(6.44)

−

A gradient ascent procedure is used instead of gradient descent since the goal

is to maximize V R 2 . So, the weight update at each iteration ( m ) is guided by

Δw kj ( m )= η ∂V R 2

(6.45)

∂w kj

A similar derivation can be done for the the imaginary weights. The expres-

sion equivalent to (6.44) is

K e i −

(( s ( z i )

∂V R 2

∂w kj

e u

s ( z i ))( s ( z i ) x j + s ( z i ) x j )

−

i =1

u =1

( s ( z u )

s ( z u ))( s ( z u ) x j + s ( z u ) x j )) .

(6.46)

−

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home