Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

ter's intensity and its prediction from the surroundings. Phasic responses of cells

indicate a difference between the actual input to a cell and its prediction from past

values. Color-opponent channels might reflect predictive coding in the chromatic

domain since the wavelength-response profiles of the three cone types overlap. Ev-

idence for predictive coding in higher visual areas like MT and IT also exists.

While some of these predictions can be computed locally, e.g. using lateral con-

nections, it might well be that a hierarchy of PEs explains the functional role of

reciprocal feed-forward/feedback connections in the visual system.

3.2 Recurrent Models

Although it was not the focus of the previous section, the hierarchical Kalman filter

already used the concept of recurrent computation to infer hidden causes from obser-

vations. While feed-forward networks transform an input x into an output y = f ( x ) ,

recurrent networks respond both to the input and their own state. In the discrete-time

case this can be described by: y t +1 = f ( y t ,x ) .

Such iterative computation is common in mathematics and scientific computing

for problems where closed-form solutions cannot be found or are too expansive

to compute. One of the best known examples of iterative algorithms is Newton's

method [167] for computing the root of a function. The general idea is to provide an

initial guess for the root and to apply a simple method for the improvement of the

approximation that is applied repeatedly, until the solution is good enough.

Recurrent computation is much more powerful than feed-forward computation.

While feed-forward neural networks with a single hidden layer can approximate any

continuous function over a compact domain, they may need exponentially many hid-

den units to solve this task. In contrast, recurrent neural networks of finite size can

emulate a Turing machine in linear time [211]. One striking example that demon-

strates the advantages of the use of recurrence is the parity function with many in-

puts. Feed-forward networks with a single hidden layer have difficulties learning the

parity problem for two inputs and need Θ (2 n ) hidden units for n inputs. Recurrent

networks that process the inputs in a serial fashion need to store only a single bit

representing the current sum of the input bits modulo two. Similar recurrent circuits

are widely used in VLSI designs.

On the other hand, the increase of computational power comes at a cost. First,

each processing element must not only be computed once, but in every time step.

This may slow down simulation of recurrent networks on a serial machine. Second,

the non-linear dynamics, described by the recurrent network, can produce rich be-

haviors that do not necessarily reflect the intentions of the network designer. Care

must be taken to avoid runaway activity, convergence to uninteresting attractors,

oscillations, and chaotic behavior if they are not desired.

Despite these difficulties, recurrent neural networks have been used for a wide

range of applications. Associative memories store patterns and allow content ad-

dressable information retrieval with noisy and incomplete data [172]. Recurrent

networks have also been used for spatio-temporal pattern analysis, e.g. for speech

Search WWH ::

Custom Search

Home