Information Technology Reference
In-Depth Information
learning rate for gradient descent that allows for both stable learning and fast con-
vergence.
Since the RPROP algorithm does not use the magnitude of the gradient, it is not
affected by very small or very large gradients. Hence, it is advisable to combine this
algorithm with BPTT. This training method for RNNs proved experimentally able
to avoid the stability problems of fixed-rate gradient descent, while at the same time
being one of the most efficient optimization methods.
Learning Attractors. When analyzing static input with a Neural Abstraction Pyra-
mid, the desired network output is usually static as well. Two goals must be com-
bined by the cost function (6.14). First, after T iterations, the final approximation
to the desired output should be as close as possible. Second, the network's output
should converge as quickly as possible to the desired output.
Hence, it is not sufficient to include only the final approximation error into the
cost function. The error weights γ t for intermediate time steps t < T must be non-
zero as well. Depending on the importance of above two goals, the error weights
must be chosen appropriately.
Constant weighting of the error components, e.g. γ t = 1 , does not pay much at-
tention to the final approximation. It may well be that the learning algorithm prefers
a coarser approximation if it can be produced faster.
Experiments showed that increasing the error weights linearly, e.g. γ t = t gives
the later error components a large enough advantage over the earlier error com-
ponents, such that the network prefers a longer approximation phase if the final
approximation to the desired output is closer.
This effect is even stronger when a quadratic weighting, e.g. γ t = t 2 , is used,
but in this case the network may produce a solution that minimizes the output dis-
tance for the last training iteration T at the cost of increasing this distance for later
iterations which are not trained.
6.4 Conclusions
This chapter discussed gradient-based techniques for supervised training of feed-
forward and recurrent neural networks. Several improvements to the basic gradient
descent method were covered. Some of these will be used in the remainder of the
thesis for supervised training of Neural Abstraction Pyramids.
The RPROP algorithm is used in combination with mini batches to speed up the
training. Low-activity priors are employed to enforce sparse representations.
For the case of recurrent pyramids, the BPTT method for computing the gradient
is combined with RPROP to ensure stable and fast training, despite large variances
in the magnitude of gradients. If the desired output is constant, the weighting of the
output error is increased linearly to quickly achieve a good approximation. In this
case, attractors are trained to coincide with the desired outputs.
Search WWH ::




Custom Search