Information Technology Reference
In-Depth Information
iteration step. On the other hand, if the new value of the error function exceeds the
value of the previous one, then the learning rate should be decreased by
approximately 30%, and in the latter case the new weight updates and the error
function are discarded, i.e. in this case we set weight update as
1 ,
ij w '
and that leads to weights in ( k + 1)th iteration as identical as ( k - 1)th, i.e.
wk
1
wk
.
1 )
ij
ij
After starting with a small learning rate, the approach will behave as follows:
k
k
1
KK
a
,
for S
w k
S
w k
1
,
k
k
1
KK
b
,
forSwk
t
kSwk
1
,
(3.14)
0
k
k
1
KK
,
otherwise
with a = 1.05, b = 0.7 and k 0 = 1.04 being typical values (Vogl et al . 1988;
Cichocki and Unbehauen, 1993).
In some training applications not all the training patterns are available before
the learning starts. In such situations an on-line approach has to be used.
Schmidhuber (1989) proposed the simple global updates of the learning rate for
each training pattern as
w
S
k
p
'
wk
K
,
(3.15)
ij
w
w
ij
with
-
½
SS
°
°
k
p
0
2
2
K
®
min
,
K
,
(3.16)
¾
max
S
°
°
¯
¿
p
where the index
K
indicates the maximum learning rate (typically
K
= 20)
max
max
and
S d d ).
Various suggestions have been made for practical use of both adaptable
learning rate and the momentum term, with the best known being the conjugate
gradient algorithm (Johansson et al. , 1992). Alternatively, the second-order
derivative-based Levenberg-Marquardt algorithm (Hagan and Menhaj, 1994),
proposed for accelerated minimization of the cost function, is preferably used for
accelerated neural networks training. The key idea of the algorithm is to use a
S is a small offset error function (typically
0.01
0.1
0
Search WWH ::




Custom Search