Supervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

memory cells are linear units that have a fixed self-connection. They enforce a con-

stant, non-exploding, non-vanishing error flow.

Access to memory cells is controlled by multiplicative gate units. Input gate

units learn to protect the constant error flow within a memory cell from perturbation

by irrelevant inputs. Likewise, output gate units learn to protect other units from

perturbation by currently irrelevant memory contents.

Learning is done by a gradient method that is a combination of BPTT and mod-

ified RTRL. The LSTM algorithm has been applied to several non-trivial problems.

For instance, it has been used to learn the structure of music [58]. Another applica-

tion was classification of natural language sentences as grammatical or ungrammat-

ical [131].

Hierarchical Recurrent Networks. Another possible approach for learning long-

term dependencies was proposed by El Hihi and Bengio [96]. They observed that

the problem of vanishing gradients only occurs because long-term dependencies are

separated by many time steps. RNNs already utilize the sequential nature of time by

using the activities of one time step as input for the next time step.

Hierarchical RNNs are based on the additional assumption that long-term de-

pendencies are robust to small local changes in the timing of events, whereas de-

pendencies spanning short intervals are allowed to be more sensitive to the precise

timing of events. This motivates the use of multiresolutional representations of the

state information. Long-term context is represented by hidden state variables which

are allowed to change very slowly, whereas short-term context is represented by

hidden state variables that change faster.

The authors compared the performance of hierarchical and flat recurrent net-

works for learning tasks involving long-term dependencies. A series of experiments

confirmed the advantages of imposing a hierarchical network structure.

The concept of representing time-dependencies at appropriate levels can be ap-

plied to the Neural Abstraction Pyramid architecture. It is very similar to the dis-

tributed representation of space-dependencies, where short-range dependencies are

represented at lower layers and long-range dependencies are represented at higher

layers of the network. If the higher layers of the pyramid operate on slower time-

scales than the lower layers, they can learn to represent longer-time dependencies.

Slowing down higher layers can be done either by less-frequent updates or by the

use of larger time-constants for fading memories.

The usefulness of such a time hierarchy has also been confirmed in the field of

reactive control of mobile robots [25]. While flat reactive systems face difficulties

when required to consider long-term context, a hierarchy of reactive behaviors can

provide longer temporal context for lower-level behaviors without large computa-

tional costs. Such a hierarchy can handle a high degree of complexity. It was suc-

cessfully applied to the problem of controlling a team of soccer-playing robots [20].

6.3.4 Random Recurrent Networks with Fading Memories

To avoid the difficulties involved with training recurrent neural networks, recently,

the use of random recurrent neural networks was proposed independently by two

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home