Information Technology Reference
In-Depth Information
information about current and past inputs that may be needed for diverse tasks. Sta-
ble internal states are not required for giving a stable output since transient internal
states can be transformed by readout neurons into stable target outputs due to the
high dimensionality of the dynamical system.
Again, it is assumed that the features needed for the computation of the desired
output are already present in the pool of randomly generated features. This is com-
parable to two existing approaches: first, the liquid could be replaced by exponential
delay lines that represent the input history. Second, the use of random connectivity
for hidden units is analogous to the classical perceptron [194], where random fea-
tures were extracted from a retina and only the weights of linear threshold output
units were trained with the perceptron learning algorithm to match desired outputs.
While such an approach is effective if enough random features are used, when the
backpropagation algorithm became available, it turned out that the adaptation of
hidden weights allowed solving the same tasks more efficiently with much smaller
networks by learning task-specific hidden representations.
6.3.5 Robust Gradient Descent
It was discussed above why supervised training of RNNs is difficult. Fortunately, in
the Neural Abstraction Pyramid approach to image interpretation, not all the prob-
lems occur at their full scale.
For instance, long-term dependencies are not needed for the interpretation of
static images since this task can usually be completed within a few iterations of the
network. Hence, the BPTT algorithm can be applied to compute the exact gradient
of the cost function, without the need to truncate the history.
Furthermore, the hierarchical network structure facilitates the hierarchical rep-
resentation of time through the extraction of invariant features. While low-level fea-
tures change quickly as the input undergoes a transformation, the outputs of higher-
level feature cells change more slowly.
Balancing Excitation and Inhibition. The decay/explosion of error flow has been
identified as the main problem in training RNNs. If the network is designed such
that balanced excitatory and inhibitory effects cancel and as a consequence the net-
work's activity changes slowly, the decay/explosion of the error flow has a long
time-constant as well. Hence, it is less harmful.
Balanced effects of excitation and inhibition can be achieved by using transfer
functions for inhibitory feature cells that grow faster than the ones of excitatory
features. For instance, inhibition could be linear, while excitation saturates for high
activities. Such an arrangement stabilizes activities where excitation and inhibition
cancel. If the network is too active, inhibition will be stronger than excitation and
will lower its activity. On the other hand, if the network is too inactive excitation is
far from being saturated and leads to an increase of activity.
Combining BPTT and RPROP. Still, the magnitudes of the backpropagated er-
rors may vary greatly. For this reason, it is very difficult to determine a constant
Search WWH ::




Custom Search