Information Technology Reference
In-Depth Information
It should also be noted that in this case the partial derivative
ww reaches its
yu
j
j
maximum for
dd it approaches its minimum as the
output y approaches the value zero or the value one.
The synaptic weights are usually changed incrementally and the neuron
gradually converges to a set of weights which solve the specific problem.
Therefore, the implementation of the backpropagation algorithm requires an
accurate realization of the sigmoid activation function and of its derivative.
The backpropagation algorithm described can also be extended to train
multilayer perceptron networks.
j y
0.5
and, since 0
1,
j
3.4.1 Accelerated Backpropagation Algorithm
The backpropagation algorithm generally suffers from a relatively slow
convergence and with the possibility of being trapped at a local minimum. Also, it
can be accompanied by possible oscillation around the located minimum value.
This may restrict its practical application in many cases. Therefore, such unwanted
drawbacks of the algorithm have to be removed, or at least reduced. For instance,
the speed of algorithm convergence can be accelerated:
x by selection of the best initial weights instead of taking the ones that are
generated at random
x through adequate preprocessing of training data, e.g. by employing the
feature extraction algorithms or some data projection methods
x by improving the optimization algorithm to be used.
Numerous heuristic optimization algorithms have been proposed for speed
acceleration; unfortunately, they are generally computationally involved and time
exhausting. In the following, only two of the most efficient are briefly reviewed:
x adaptation of learning rate
x using a momentum term.
It is usually assumed that the learning rate of the algorithm is fixed and uniform for
all weights during the training iterations. In order to prevent parasitic oscillations
and to ensure the convergence to the global minimum, the learning rate must be
kept as small as possible. However, a very small value of learning rate slows down
the convergence speed of algorithm considerably. On the other hand, a large value
of the learning rate results in an unstable learning process. Therefore, the learning
rate has to be optimally set between the two extreme values of learning rate, e.g. by
using the adaptive learning rate , and in this way the training time can be
considerably reduced. Similarly, the speed up of convergence can be achieved by
extending the training algorithm by a momentum term (Kröse and Smagt, 1996).
In this case the learning rate can be kept at each iteration step as large as possible
within the admitted values, while maintaining the learning process stable.
One of the simplest heuristic approaches of learning rate tuning is to increase
the learning rate slightly (typically by 5%) in an iteration step if the new value of
the output error (sum squared error) function S is smaller than the previous
Search WWH ::




Custom Search