Information Technology Reference
In-Depth Information
gradient descent rule is used, with the update of the weights at each iteration
( m ), that is, after the presentation of each training pattern to the network,
written as
w R, ( m )
kj
= w R, ( m− 1)
kj
Δw R, ( m )
kj
,
(6.36)
with
w R, ( m )
kj
=2 ηe k ( s ( z k ) − s ( z k )) s ( z k ) x j − s ( z k ) x j .
∂E
∂w kj
Δw R, ( m )
kj
= η
(6.37)
A similar derivation can be made for the case of the imaginary part of the
weights, yielding
w I, ( m )
kj
s ( z k )) s ( z k ) x j + s ( z k ) x j .
∂E
∂w kj
Δw I, ( m )
kj
=2 ηe k ( s ( z k )
= η
(6.38)
It is possible to show that the final expressions for the adjustment of the real
and imaginary parts of the bias are
Δθ R, ( m )
k
=2 ηe k ( s ( z k )
s ( z k )) s ( z k )
(6.39)
and
Δθ I, ( m )
k
=2 ηe k ( s ( z k )
s ( z k )) s ( z k ) .
(6.40)
6.3.3 MMSE Batch Algorithm
This section presents the batch version of the algorithm studied in the pre-
ceding section, as proposed in [3]. The change w.r.t. the original algorithm
is on the empirical risk functional to be minimized: instead of (6.34) as in
the original algorithm [9] it now contains the error contributions from all n
patterns in the training set,
n
N
1
2 L
y k ) 2 .
E ( w )=
( t k
(6.41)
l =1
k =1
The difference between this batch approach and the stochastic one presented
earlier is, as usual, that the values of Δw kj and Δθ k obtained after each
pattern is presented to the network are summed and the weights are only
updated at the end of each epoch.
 
Search WWH ::




Custom Search