Combined Model and Task Learning, and Other Mechanisms - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

letter will come next, and so it makes lots of errors.

Thus, one could use this network as a grammaticality

detector, to determine if a given string fits the gram-

mar. In this sense, the network has incorporated the

FSA structure itself into its own representations.

1981). Reinforcement learning (RL) is so named be-

cause it is based on the idea that relatively global re-

inforcement signals (i.e., reward and punishment) can

drive learning that seeks to enhance reward and avoid

punishment. This is the kind of learning that goes on

in classical and operant conditioning . Thus, not only

does this form of learning solve the temporal credit as-

signment problem, it is also closely related to relevant

psychological and biological phenomena. In fact, it has

recently been shown that the detailed properties of the

TD algorithm have a close relationship to properties of

various subcortical brain areas (Montague et al., 1996;

Schultz, Dayan, & Montague, 1997), as we will review

later.

We will start with a discussion of the behavior and

biology of reinforcement learning, then review the stan-

dard formalization of the TD algorithm, and then show

how the notion of activation phases used in the GeneRec

algorithm can be used to implement the version of TD

that we will use in the Leabra algorithm. This makes

the relationship between TD and standard error-driven

learning very apparent. We will then go on to explore a

simulation of TD learning in action.

Go to the PDP++Root window. To continue on to

the next simulation, close this project first by selecting

.projects/Remove/Project_0 . Or, if you wish to

stop now, quit by selecting Object/Quit .

, !

6.6.4

Summary

We have seen that the context layer in an SRN can en-

able a network to learn temporally extended sequential

tasks. Later, in chapters 9 and 11, we will also augment

the simple and somewhat limited Markovian context of

the SRN by introducing two additional mechanisms that

introduce greater flexibility in deciding when and what

to represent in the context.

6.7

Reinforcement Learning for Temporally

Delayed Outcomes

The context layer in an SRN provides a means of re-

taining the immediately preceding context information.

However, in many cases we need to learn about tem-

poral contingencies that span many time steps. More

specifically, we need to be able to solve the tempo-

ral credit assignment problem. Recall from the dis-

cussion of error-driven learning that it solves the credit

(blame) assignment problem by figuring out which units

are most responsible for the current error signal. The

temporal credit assignment problem is similar, but it is

about figuring out which events in the past are most re-

sponsible for a subsequent outcome. We will see that

this temporal credit assignment problem can be solved

in a very similar way as the earlier structural form of

credit assignment — by using a time-based form of

error-driven learning.

One of the primary means of solving the temporal

credit assignment problem is the temporal differences

(TD) learning algorithm developed by Sutton (1988)

based on similar earlier ideas used to model the phe-

nomenon of reinforcement learning (Sutton & Barto,

Behavior and Biology of Reinforcement Learning

As most students of psychology know, classical condi-

tioning is the form of learning where the conditioned

animal learns that stimuli (e.g., a light or a tone) are

predictive of rewards or punishments (e.g., delivery of

food or water, or of a shock). The stimulus is called the

conditioned stimulus, or CS, and the reward/punishment

is called the unconditioned stimulus, or US. In oper-

ant conditioning, a behavior performed by the animal

serves as the CS. We explore the basic acquisition and

extinction (unlearning) of conditioned associations in

this section.

Some of the brain areas that appear to be special-

ized for reinforcement learning are the midbrain nu-

clei (well-defined groups of neurons) such as the ven-

tral tegmental area (VTA) and the substantia nigra (SN),

and the cortical and subcortical areas that control the fir-

ing of these neurons. Neurons in these midbrain areas

project the neurotransmitter dopamine (DA) widely to

the frontal cortex (VTA) and basal ganglia (SN), and

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home