Information Technology Reference
In-Depth Information
Sequence and Temporally Delayed Learning
At this point, feel free to explore the many parame-
ters available, and see how the network responds. After
you change any of the parameters, be sure to press the
MakeEnv button to make a new environment based on
these new parameters.
Finally, we can present some of the limitations of the
CSC representation. One obvious problem is capacity
— each stimulus requires a different set of units for all
possible time intervals that can be represented. Also,
the CSC begs the question of how time is initialized to
zero at the right point so every trial is properly synchro-
nized. Finally, the CSC requires that the stimulus stay
on (or some trace of it, which you can manipulate using
the tr parameter) up to the point of reward, which is
unrealistic. This last problem points to an important is-
sue with the TD algorithm, which is that although it can
learn to bridge temporal gaps, it requires some suitable
representation to support this bridging. We will see in
chapters 9 and 11 that this and the other problems can
be resolved by allowing the TD system to control the
updating of context-like representations.
, !
Learning to solve tasks having temporally extended
sequential contingencies requires the proper develop-
ment, maintenance and updating of context represen-
tations that specify a location within the sequence. A
simple recurrent network (SRN) enables sequential
learning tasks to be solved by copying the hidden layer
activations from the previous time step into a context
layer . The specialized context maintenance abilities of
the prefrontal cortex may play the role of the context
layerinanSRN.AnSRNcanlearna finite state au-
tomaton task by developing an internal representation
of the underlying node states.
The mathematical framework of reinforcement
learning can be used for learning with temporally de-
layed contingency information. The temporal differ-
ences (TD) reinforcement learning algorithm provides
a good fit to the neural firing properties of neurons in
the VTA . These neurons secrete the neuromodulator
dopamine to the frontal cortex, and dopamine has been
shown to modulate learning. The TD algorithm is based
on minimizing differences in expectations of future re-
ward values, and can be implemented using the same
phases as in the GeneRec algorithm. Various condi-
tioning phenomena can be modeled using the TD al-
gorithm, including acquisition, extinction ,and second-
order conditioning.
To stop now, quit by selecting Object/Quit in the
PDP++Root window.
, !
6.8
Summary
Combined Model and Task Learning
There are sound functional reasons to believe that both
Hebbian model learning and error-driven task learning
are taking place in the cortex. As we will see in later
chapters, both types of learning are required to account
for the full range of cognitive phenomena considered.
Computationally, Hebbian learning acts locally ,andis
autonomous and reliable ,butalso myopic and greedy .
Error-driven learning is driven by remote error signals ,
and the units cooperate to solve tasks. However, it can
suffer from codependency and laziness . The result of
combining both types of learning is representations that
encode important statistical features of the activity pat-
terns they are exposed to, and also play a role in solving
the particular tasks the network must perform. Specific
advantages of the combined learning algorithm can be
seen in generalization tasks, and tasks that use a deep
network with many hidden layers.
6.9
Further Reading
The Sutton and Barto (1998) Reinforcement Learning
topic is an excellent reference for reinforcement learn-
ing.
Mozer (1993) provides a nice overview of a variety
of different approaches toward temporal sequence pro-
cessing.
The journal Neural Computation and the NIPS con-
ference proceedings ( Advances in Neural Information
Processing ) always have a large number of high-quality
articles on computational and biological approaches to
learning.
For more detailed coverage of the combination of
error-driven and Hebbian learning, see O'Reilly (1998)
and O'Reilly (in press).
Search WWH ::




Custom Search