Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

gradient descent rule is used, with the update of the weights at each iteration

( m ), that is, after the presentation of each training pattern to the network,

written as

w R, ( m )

= w R, ( m− 1)

Δw R, ( m )

−

(6.36)

with

w R, ( m )

=2 ηe k ( s ( z k ) − s ( z k )) s ( z k ) x j − s ( z k ) x j .

∂E

∂w kj

Δw R, ( m )

= η

(6.37)

A similar derivation can be made for the case of the imaginary part of the

weights, yielding

w I, ( m )

s ( z k )) s ( z k ) x j + s ( z k ) x j .

∂E

∂w kj

Δw I, ( m )

=2 ηe k ( s ( z k )

= η

−

(6.38)

It is possible to show that the final expressions for the adjustment of the real

and imaginary parts of the bias are

Δθ R, ( m )

=2 ηe k ( s ( z k )

s ( z k )) s ( z k )

−

(6.39)

and

Δθ I, ( m )

=2 ηe k ( s ( z k )

s ( z k )) s ( z k ) .

−

(6.40)

6.3.3 MMSE Batch Algorithm

This section presents the batch version of the algorithm studied in the pre-

ceding section, as proposed in [3]. The change w.r.t. the original algorithm

is on the empirical risk functional to be minimized: instead of (6.34) as in

the original algorithm [9] it now contains the error contributions from all n

patterns in the training set,

2 L

y k ) 2 .

E ( w )=

( t k −

(6.41)

l =1

k =1

The difference between this batch approach and the stochastic one presented

earlier is, as usual, that the values of Δw kj and Δθ k obtained after each

pattern is presented to the network are summed and the weights are only

updated at the end of each epoch.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home