Information Technology Reference
In-Depth Information
remedy, an increase in the normalization of weights at every iteration step is
necessary. Oja (1982) proposed using for this the normalization relationship
wt
()
K
K
x tyt
() ()
wt
( )
i
,
i
[()
wt
xtyt
()( ]
2
¦
i
i
i
derived through modification of the Hebbian rule itself. The modification
normalizes the weight vector size to the value 1 by decreasing the values of all
other weight vectors if one of its components increases, in this way keeping the
total length of the vector constant.
The above rule modification can, for a small value of Ș and after power
expansion, be approximated as
wt
(
#
1)
wt
()
K
yt x
()[
ytwt
()
()]
,
i
i
i
i
which is known as Oja's rule.
Yet, the fact that the application of the Hebbian rule is considerably limited to
single-layer neural networks, the original version of the backpropagation
algorithm is favoured for training of multilayer networks. The training is
performed off-line in a supervisory learning mode, which is convenient because, in
practice, a large number of data are available that have to be processed prior to
their application for training. Besides, for forecasting purposes the pairs of related
input and output data also have to be built and processed. Finally, the supervisory
mode of learning facilitates the implementation of monitoring of training
performance and the determination of the training stopping point.
When applying the backpropagation algorithm, which is a typical gradient
steepest descent method, decisions have to be made concerning the
x learning rate , i.e . the step size or the magnitude of weight updating
x momentum , which is required for escaping the trapping in local minima.
An appropriate selection of learning rate is particularly important because the
steepest descent method suffers from slow convergence and weak robustness.
Convergence acceleration by taking a larger learning rate bears the danger of
network oscillatory behaviour around the minimum. To avoid this, and still to take
a larger learning rate, addition of a momentum parameter was recommended
(Rumelhart et al. , 1986). By doing this, the original learning step according to the
delta rule
wt
( ) ()
wt
KH
()()
t x t
p
p
is extended by the momentum term to result in
wt
( )
wt
()
KG
()() [()
txt
D
wt wt
,
( ]
ij
ij
i
j
ij
ij
Search WWH ::




Custom Search