Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

learning rate for gradient descent that allows for both stable learning and fast con-

vergence.

Since the RPROP algorithm does not use the magnitude of the gradient, it is not

affected by very small or very large gradients. Hence, it is advisable to combine this

algorithm with BPTT. This training method for RNNs proved experimentally able

to avoid the stability problems of fixed-rate gradient descent, while at the same time

being one of the most efficient optimization methods.

Learning Attractors. When analyzing static input with a Neural Abstraction Pyra-

mid, the desired network output is usually static as well. Two goals must be com-

bined by the cost function (6.14). First, after T iterations, the final approximation

to the desired output should be as close as possible. Second, the network's output

should converge as quickly as possible to the desired output.

Hence, it is not sufficient to include only the final approximation error into the

cost function. The error weights γ t for intermediate time steps t < T must be non-

zero as well. Depending on the importance of above two goals, the error weights

must be chosen appropriately.

Constant weighting of the error components, e.g. γ t = 1 , does not pay much at-

tention to the final approximation. It may well be that the learning algorithm prefers

a coarser approximation if it can be produced faster.

Experiments showed that increasing the error weights linearly, e.g. γ t = t gives

the later error components a large enough advantage over the earlier error com-

ponents, such that the network prefers a longer approximation phase if the final

approximation to the desired output is closer.

This effect is even stronger when a quadratic weighting, e.g. γ t = t 2 , is used,

but in this case the network may produce a solution that minimizes the output dis-

tance for the last training iteration T at the cost of increasing this distance for later

iterations which are not trained.

6.4 Conclusions

This chapter discussed gradient-based techniques for supervised training of feed-

forward and recurrent neural networks. Several improvements to the basic gradient

descent method were covered. Some of these will be used in the remainder of the

thesis for supervised training of Neural Abstraction Pyramids.

The RPROP algorithm is used in combination with mini batches to speed up the

training. Low-activity priors are employed to enforce sparse representations.

For the case of recurrent pyramids, the BPTT method for computing the gradient

is combined with RPROP to ensure stable and fast training, despite large variances

in the magnitude of gradients. If the desired output is constant, the weighting of the

output error is increased linearly to quickly achieve a good approximation. In this

case, attractors are trained to coincide with the desired outputs.

Search WWH ::

Custom Search

Home