Information Technology Reference
In-Depth Information
letter will come next, and so it makes lots of errors.
Thus, one could use this network as a grammaticality
detector, to determine if a given string fits the gram-
mar. In this sense, the network has incorporated the
FSA structure itself into its own representations.
1981). Reinforcement learning (RL) is so named be-
cause it is based on the idea that relatively global re-
inforcement signals (i.e., reward and punishment) can
drive learning that seeks to enhance reward and avoid
punishment. This is the kind of learning that goes on
in
classical
and
operant
conditioning
. Thus, not only
does this form of learning solve the temporal credit as-
signment problem, it is also closely related to relevant
psychological and biological phenomena. In fact, it has
recently been shown that the detailed properties of the
TD algorithm have a close relationship to properties of
various subcortical brain areas (Montague et al., 1996;
Schultz, Dayan, & Montague, 1997), as we will review
later.
We will start with a discussion of the behavior and
biology of reinforcement learning, then review the stan-
dard formalization of the TD algorithm, and then show
how the notion of activation phases used in the GeneRec
algorithm can be used to implement the version of TD
that we will use in the Leabra algorithm. This makes
the relationship between TD and standard error-driven
learning very apparent. We will then go on to explore a
simulation of TD learning in action.
Go to the
PDP++Root
window. To continue on to
the next simulation, close this project first by selecting
.projects/Remove/Project_0
. Or, if you wish to
stop now, quit by selecting
Object/Quit
.
,
!
6.6.4
Summary
We have seen that the context layer in an SRN can en-
able a network to learn temporally extended sequential
tasks. Later, in chapters 9 and 11, we will also augment
the simple and somewhat limited Markovian context of
the SRN by introducing two additional mechanisms that
introduce greater flexibility in deciding
when
and
what
to represent in the context.
6.7
Reinforcement Learning for Temporally
Delayed Outcomes
The context layer in an SRN provides a means of re-
taining the immediately preceding context information.
However, in many cases we need to learn about tem-
poral contingencies that span many time steps. More
specifically, we need to be able to solve the
tempo-
ral credit assignment
problem. Recall from the dis-
cussion of error-driven learning that it solves the credit
(blame) assignment problem by figuring out which units
are most responsible for the current error signal. The
temporal credit assignment problem is similar, but it is
about figuring out which
events
in the past are most re-
sponsible for a subsequent outcome. We will see that
this temporal credit assignment problem can be solved
in a very similar way as the earlier
structural
form of
credit assignment — by using a time-based form of
error-driven learning.
One of the primary means of solving the temporal
credit assignment problem is the
temporal differences
(TD) learning algorithm developed by Sutton (1988)
based on similar earlier ideas used to model the phe-
nomenon of
reinforcement learning
(Sutton & Barto,
Behavior and Biology of Reinforcement Learning
As most students of psychology know, classical condi-
tioning is the form of learning where the conditioned
animal learns that stimuli (e.g., a light or a tone) are
predictive of rewards or punishments (e.g., delivery of
food or water, or of a shock). The stimulus is called the
conditioned stimulus,
or CS, and the reward/punishment
is called the
unconditioned stimulus,
or US. In oper-
ant conditioning, a behavior performed by the animal
serves as the CS. We explore the basic acquisition and
extinction
(unlearning) of conditioned associations in
this section.
Some of the brain areas that appear to be special-
ized for reinforcement learning are the midbrain nu-
clei (well-defined groups of neurons) such as the ven-
tral tegmental area (VTA) and the substantia nigra (SN),
and the cortical and subcortical areas that control the fir-
ing of these neurons. Neurons in these midbrain areas
project the neurotransmitter
dopamine (DA)
widely to
the frontal cortex (VTA) and basal ganglia (SN), and
Search WWH ::
Custom Search