Information Technology Reference
In-Depth Information
ter's intensity and its prediction from the surroundings. Phasic responses of cells
indicate a difference between the actual input to a cell and its prediction from past
values. Color-opponent channels might reflect predictive coding in the chromatic
domain since the wavelength-response profiles of the three cone types overlap. Ev-
idence for predictive coding in higher visual areas like MT and IT also exists.
While some of these predictions can be computed locally, e.g. using lateral con-
nections, it might well be that a hierarchy of PEs explains the functional role of
reciprocal feed-forward/feedback connections in the visual system.
3.2 Recurrent Models
Although it was not the focus of the previous section, the hierarchical Kalman filter
already used the concept of recurrent computation to infer hidden causes from obser-
vations. While feed-forward networks transform an input x into an output y = f ( x ) ,
recurrent networks respond both to the input and their own state. In the discrete-time
case this can be described by: y t +1 = f ( y t ,x ) .
Such iterative computation is common in mathematics and scientific computing
for problems where closed-form solutions cannot be found or are too expansive
to compute. One of the best known examples of iterative algorithms is Newton's
method [167] for computing the root of a function. The general idea is to provide an
initial guess for the root and to apply a simple method for the improvement of the
approximation that is applied repeatedly, until the solution is good enough.
Recurrent computation is much more powerful than feed-forward computation.
While feed-forward neural networks with a single hidden layer can approximate any
continuous function over a compact domain, they may need exponentially many hid-
den units to solve this task. In contrast, recurrent neural networks of finite size can
emulate a Turing machine in linear time [211]. One striking example that demon-
strates the advantages of the use of recurrence is the parity function with many in-
puts. Feed-forward networks with a single hidden layer have difficulties learning the
parity problem for two inputs and need Θ (2 n ) hidden units for n inputs. Recurrent
networks that process the inputs in a serial fashion need to store only a single bit
representing the current sum of the input bits modulo two. Similar recurrent circuits
are widely used in VLSI designs.
On the other hand, the increase of computational power comes at a cost. First,
each processing element must not only be computed once, but in every time step.
This may slow down simulation of recurrent networks on a serial machine. Second,
the non-linear dynamics, described by the recurrent network, can produce rich be-
haviors that do not necessarily reflect the intentions of the network designer. Care
must be taken to avoid runaway activity, convergence to uninteresting attractors,
oscillations, and chaotic behavior if they are not desired.
Despite these difficulties, recurrent neural networks have been used for a wide
range of applications. Associative memories store patterns and allow content ad-
dressable information retrieval with noisy and incomplete data [172]. Recurrent
networks have also been used for spatio-temporal pattern analysis, e.g. for speech
Search WWH ::




Custom Search