Information Technology Reference
In-Depth Information
words one has heard so far. Second, it makes it possible
for the network to actually achieve correct performance
most of the time, which is not possible with the original
method. This enables us to monitor its learning perfor-
mance over training more easily.
Third, with Leabra, there is no way to present ques-
tions about future information during training without
effectively exposing the network to that information.
Specifically, in Leabra errors are equivalent to activa-
tion states, and the entire activation state of the network
must be updated for proper error signals to be propa-
gated for each question. Thus, if all of the questions
were asked after each input, information about the en-
tire sentence would be propagated into the network (and
thus into the gestalt representation) via the plus phases
of the questions. To preserve the idea that the gestalt
is updated primarily from the inputs, we ask questions
only about current or previous inputs. The original SG
model instead used a somewhat complicated error prop-
agation mechanism, where all the errors for the ques-
tions were accumulated separately in the output end of
the network, and then passed back to the encoding por-
tion of the network after the entire sentence has been
processed. Thus, there was a dissociation between acti-
vation states based on the inputs received so far, and the
subsequent error signals based on all the questions, al-
lowing all of the questions to be asked after each input.
Training the network by asking it explicitly about
roles is only one of many different possible ways that
this information could be learned. For example, visual
processing of an event simultaneous with a verbal de-
scription of it would provide a means of training the
sentence-based encoding to reflect the actual state of
affairs in the environment. However, the explicit role-
filler training used here is simple and provides a clear
picture of how well the gestalt representation has en-
coded the appropriate information.
The network parameters were fairly standard for a
larger sized network, with 25 percent activity in the
encoding and decoding hidden layers, and 15 percent
activity in the gestalt hidden layer. The proportion of
Hebbian learning was .001, and the learning rate was
reduced from .01 to .001 after 200 epochs of train-
ing. The fm hid and fm prv parameters for updating
the context layer were set to the standard values of .7
and .3, which allows for some retention of prior states
but mostly copies the current hidden state.
10.7.2
Exploring the Model
Open the project sg.proj.gz in chapter_10 to
begin.
As usual, the network is in skeleton form and must
be built.
Do BuildNet on the sg_ctrl overall control panel
to build it.
Then, you can poke around the network and explore
the connectivity using the r.wt button, and then return
to viewing act .
Note that the input/output units are all labeled accord-
ingtothefirst two letters of the word, role, or concept
that they represent.
, !
Training
First, let's see exactly how the network is trained by
stepping through some training trials.
Open up a training log by doing View , TRAIN_LOG ,
and then open up a process control panel for training
by doing View , TRAIN_PROCESS_CTRL .Do ReInit .
There will be a delay while an entire epoch'sworth
of sentences (100 sentences) are randomly generated.
Then press Step (a similar delay will ensue, due to the
need to recreate these sentences at the start of every
epoch — because these happen at different levels of
processing, the redundancy is difficult to avoid).
You should see the first word of the first sentence pre-
sented to the network. Recall that as each word is pre-
sented, questions are asked about all current and previ-
ous information presented to the network. Because this
is the first word, the network has just performed a mi-
nus and plus phase update with this word as input, and
it tried to answer what the agent of the sentence is.
To understand how the words are presented, let's first
look at the training log ( Trial_0_TextLog ,seefig-
ure 10.28 for an example from the trained network).
The trial and EventGp columns change with each
different word of the sentence, with EventGp showing
the word that is currently being processed. Within the
presentation of each word, there are one or more events
, !
Search WWH ::




Custom Search