Information Technology Reference
In-Depth Information
memory cells are linear units that have a fixed self-connection. They enforce a con-
stant, non-exploding, non-vanishing error flow.
Access to memory cells is controlled by multiplicative gate units. Input gate
units learn to protect the constant error flow within a memory cell from perturbation
by irrelevant inputs. Likewise, output gate units learn to protect other units from
perturbation by currently irrelevant memory contents.
Learning is done by a gradient method that is a combination of BPTT and mod-
ified RTRL. The LSTM algorithm has been applied to several non-trivial problems.
For instance, it has been used to learn the structure of music [58]. Another applica-
tion was classification of natural language sentences as grammatical or ungrammat-
ical [131].
Hierarchical Recurrent Networks. Another possible approach for learning long-
term dependencies was proposed by El Hihi and Bengio [96]. They observed that
the problem of vanishing gradients only occurs because long-term dependencies are
separated by many time steps. RNNs already utilize the sequential nature of time by
using the activities of one time step as input for the next time step.
Hierarchical RNNs are based on the additional assumption that long-term de-
pendencies are robust to small local changes in the timing of events, whereas de-
pendencies spanning short intervals are allowed to be more sensitive to the precise
timing of events. This motivates the use of multiresolutional representations of the
state information. Long-term context is represented by hidden state variables which
are allowed to change very slowly, whereas short-term context is represented by
hidden state variables that change faster.
The authors compared the performance of hierarchical and flat recurrent net-
works for learning tasks involving long-term dependencies. A series of experiments
confirmed the advantages of imposing a hierarchical network structure.
The concept of representing time-dependencies at appropriate levels can be ap-
plied to the Neural Abstraction Pyramid architecture. It is very similar to the dis-
tributed representation of space-dependencies, where short-range dependencies are
represented at lower layers and long-range dependencies are represented at higher
layers of the network. If the higher layers of the pyramid operate on slower time-
scales than the lower layers, they can learn to represent longer-time dependencies.
Slowing down higher layers can be done either by less-frequent updates or by the
use of larger time-constants for fading memories.
The usefulness of such a time hierarchy has also been confirmed in the field of
reactive control of mobile robots [25]. While flat reactive systems face difficulties
when required to consider long-term context, a hierarchy of reactive behaviors can
provide longer temporal context for lower-level behaviors without large computa-
tional costs. Such a hierarchy can handle a high degree of complexity. It was suc-
cessfully applied to the problem of controlling a team of soccer-playing robots [20].
6.3.4 Random Recurrent Networks with Fading Memories
To avoid the difficulties involved with training recurrent neural networks, recently,
the use of random recurrent neural networks was proposed independently by two
Search WWH ::




Custom Search