Information Technology Reference
In-Depth Information
gradient descent rule is used, with the update of the weights at each iteration
(
m
), that is, after the presentation of each training pattern to the network,
written as
w
R,
(
m
)
kj
=
w
R,
(
m−
1)
kj
Δw
R,
(
m
)
kj
−
,
(6.36)
with
w
R,
(
m
)
kj
=2
ηe
k
(
s
(
z
k
)
− s
(
z
k
))
s
(
z
k
)
x
j
− s
(
z
k
)
x
j
.
∂E
∂w
kj
Δw
R,
(
m
)
kj
=
η
(6.37)
A similar derivation can be made for the case of the imaginary part of the
weights, yielding
w
I,
(
m
)
kj
s
(
z
k
))
s
(
z
k
)
x
j
+
s
(
z
k
)
x
j
.
∂E
∂w
kj
Δw
I,
(
m
)
kj
=2
ηe
k
(
s
(
z
k
)
=
η
−
(6.38)
It is possible to show that the final expressions for the adjustment of the real
and imaginary parts of the bias are
Δθ
R,
(
m
)
k
=2
ηe
k
(
s
(
z
k
)
s
(
z
k
))
s
(
z
k
)
−
(6.39)
and
Δθ
I,
(
m
)
k
=2
ηe
k
(
s
(
z
k
)
s
(
z
k
))
s
(
z
k
)
.
−
(6.40)
6.3.3 MMSE Batch Algorithm
This section presents the batch version of the algorithm studied in the pre-
ceding section, as proposed in [3]. The change w.r.t. the original algorithm
is on the empirical risk functional to be minimized: instead of (6.34) as in
the original algorithm [9] it now contains the error contributions from all
n
patterns in the training set,
n
N
1
2
L
y
k
)
2
.
E
(
w
)=
(
t
k
−
(6.41)
l
=1
k
=1
The difference between this batch approach and the stochastic one presented
earlier is, as usual, that the values of
Δw
kj
and
Δθ
k
obtained after each
pattern is presented to the network are summed and the weights are only
updated at the end of each epoch.