Information Technology Reference
In-Depth Information
6.3.4 MEE Algorithm
The minimum error entropy (MEE) will now be used instead of the mini-
mum mean square error (MMSE) as the optimization principle behind the
learning algorithm for this network. MEE training needs a batch mode al-
gorithm because the distribution of the errors for updating the weights has
to be estimated, so several of these errors are needed to obtain a good PDF
estimate.
As seen above, the error e j = t j
y j represents the difference between
the target t j of the j neuron and its output y j . The MSE of the variable e j
will be replaced with its EE counterpart. First it is necessary to estimate the
PDF of the error. For this, the Parzen window estimator is used.
Using the empirical estimate of H 2 EE, as in expression (3.5), we minimize
the corresponding information potential, which (ignoring constant factors) is
written as
K e i
.
n
n
e u
V R 2 =
(6.42)
h
i =1
u =1
The derivative of V R 2
w.r.t. the real weights is:
∂e i
∂w kj
.
K e i
n
n
∂V R 2
∂w kj
1
h
e u
∂e u
∂w kj
=
(6.43)
h
i =1
u =1
∂e i
∂w kj
∂y i
Since
is given by
∂w kj , expression (6.43) can be written as
K e i
(( s ( z i )
n
n
∂V R 2
∂w kj
e u
2
h
s ( z i ))( s ( z i ) x j
s ( z i ) x j )
=
h
i =1
u =1
( s ( z u )
s ( z u ))( s ( z u ) x j
s ( z u ) x j )) .
(6.44)
A gradient ascent procedure is used instead of gradient descent since the goal
is to maximize V R 2 . So, the weight update at each iteration ( m ) is guided by
Δw kj ( m )= η ∂V R 2
.
(6.45)
∂w kj
A similar derivation can be done for the the imaginary weights. The expres-
sion equivalent to (6.44) is
K e i
(( s ( z i )
n
n
∂V R 2
∂w kj
2
h
e u
s ( z i ))( s ( z i ) x j + s ( z i ) x j )
=
h
i =1
u =1
( s ( z u )
s ( z u ))( s ( z u ) x j + s ( z u ) x j )) .
(6.46)
 
Search WWH ::




Custom Search