Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

information about current and past inputs that may be needed for diverse tasks. Sta-

ble internal states are not required for giving a stable output since transient internal

states can be transformed by readout neurons into stable target outputs due to the

high dimensionality of the dynamical system.

Again, it is assumed that the features needed for the computation of the desired

output are already present in the pool of randomly generated features. This is com-

parable to two existing approaches: first, the liquid could be replaced by exponential

delay lines that represent the input history. Second, the use of random connectivity

for hidden units is analogous to the classical perceptron [194], where random fea-

tures were extracted from a retina and only the weights of linear threshold output

units were trained with the perceptron learning algorithm to match desired outputs.

While such an approach is effective if enough random features are used, when the

backpropagation algorithm became available, it turned out that the adaptation of

hidden weights allowed solving the same tasks more efficiently with much smaller

networks by learning task-specific hidden representations.

6.3.5 Robust Gradient Descent

It was discussed above why supervised training of RNNs is difficult. Fortunately, in

the Neural Abstraction Pyramid approach to image interpretation, not all the prob-

lems occur at their full scale.

For instance, long-term dependencies are not needed for the interpretation of

static images since this task can usually be completed within a few iterations of the

network. Hence, the BPTT algorithm can be applied to compute the exact gradient

of the cost function, without the need to truncate the history.

Furthermore, the hierarchical network structure facilitates the hierarchical rep-

resentation of time through the extraction of invariant features. While low-level fea-

tures change quickly as the input undergoes a transformation, the outputs of higher-

level feature cells change more slowly.

Balancing Excitation and Inhibition. The decay/explosion of error flow has been

identified as the main problem in training RNNs. If the network is designed such

that balanced excitatory and inhibitory effects cancel and as a consequence the net-

work's activity changes slowly, the decay/explosion of the error flow has a long

time-constant as well. Hence, it is less harmful.

Balanced effects of excitation and inhibition can be achieved by using transfer

functions for inhibitory feature cells that grow faster than the ones of excitatory

features. For instance, inhibition could be linear, while excitation saturates for high

activities. Such an arrangement stabilizes activities where excitation and inhibition

cancel. If the network is too active, inhibition will be stronger than excitation and

will lower its activity. On the other hand, if the network is too inactive excitation is

far from being saturated and leads to an increase of activity.

Combining BPTT and RPROP. Still, the magnitudes of the backpropagated er-

rors may vary greatly. For this reason, it is very difficult to determine a constant

Search WWH ::

Custom Search

Home