Information Technology Reference
In-Depth Information
letter will come next, and so it makes lots of errors.
Thus, one could use this network as a grammaticality
detector, to determine if a given string fits the gram-
mar. In this sense, the network has incorporated the
FSA structure itself into its own representations.
1981). Reinforcement learning (RL) is so named be-
cause it is based on the idea that relatively global re-
inforcement signals (i.e., reward and punishment) can
drive learning that seeks to enhance reward and avoid
punishment. This is the kind of learning that goes on
in classical and operant conditioning . Thus, not only
does this form of learning solve the temporal credit as-
signment problem, it is also closely related to relevant
psychological and biological phenomena. In fact, it has
recently been shown that the detailed properties of the
TD algorithm have a close relationship to properties of
various subcortical brain areas (Montague et al., 1996;
Schultz, Dayan, & Montague, 1997), as we will review
later.
We will start with a discussion of the behavior and
biology of reinforcement learning, then review the stan-
dard formalization of the TD algorithm, and then show
how the notion of activation phases used in the GeneRec
algorithm can be used to implement the version of TD
that we will use in the Leabra algorithm. This makes
the relationship between TD and standard error-driven
learning very apparent. We will then go on to explore a
simulation of TD learning in action.
Go to the PDP++Root window. To continue on to
the next simulation, close this project first by selecting
.projects/Remove/Project_0 . Or, if you wish to
stop now, quit by selecting Object/Quit .
, !
6.6.4
Summary
We have seen that the context layer in an SRN can en-
able a network to learn temporally extended sequential
tasks. Later, in chapters 9 and 11, we will also augment
the simple and somewhat limited Markovian context of
the SRN by introducing two additional mechanisms that
introduce greater flexibility in deciding when and what
to represent in the context.
6.7
Reinforcement Learning for Temporally
Delayed Outcomes
The context layer in an SRN provides a means of re-
taining the immediately preceding context information.
However, in many cases we need to learn about tem-
poral contingencies that span many time steps. More
specifically, we need to be able to solve the tempo-
ral credit assignment problem. Recall from the dis-
cussion of error-driven learning that it solves the credit
(blame) assignment problem by figuring out which units
are most responsible for the current error signal. The
temporal credit assignment problem is similar, but it is
about figuring out which events in the past are most re-
sponsible for a subsequent outcome. We will see that
this temporal credit assignment problem can be solved
in a very similar way as the earlier structural form of
credit assignment — by using a time-based form of
error-driven learning.
One of the primary means of solving the temporal
credit assignment problem is the temporal differences
(TD) learning algorithm developed by Sutton (1988)
based on similar earlier ideas used to model the phe-
nomenon of reinforcement learning (Sutton & Barto,
Behavior and Biology of Reinforcement Learning
As most students of psychology know, classical condi-
tioning is the form of learning where the conditioned
animal learns that stimuli (e.g., a light or a tone) are
predictive of rewards or punishments (e.g., delivery of
food or water, or of a shock). The stimulus is called the
conditioned stimulus, or CS, and the reward/punishment
is called the unconditioned stimulus, or US. In oper-
ant conditioning, a behavior performed by the animal
serves as the CS. We explore the basic acquisition and
extinction (unlearning) of conditioned associations in
this section.
Some of the brain areas that appear to be special-
ized for reinforcement learning are the midbrain nu-
clei (well-defined groups of neurons) such as the ven-
tral tegmental area (VTA) and the substantia nigra (SN),
and the cortical and subcortical areas that control the fir-
ing of these neurons. Neurons in these midbrain areas
project the neurotransmitter dopamine (DA) widely to
the frontal cortex (VTA) and basal ganglia (SN), and
Search WWH ::




Custom Search