Information Technology Reference
In-Depth Information
training. The 15 epochs amounts to only 175 different
sequences and the 54 epochs amounts to 650 sequences
(each set of 25 sequences lasts for 2 epochs).
In either case, the Leabra network is much faster
than the backpropagation network used by (Cleeremans
et al., 1989), which took 60,000 sequences (i.e., 4,800
epochs under our scheme). However, we were able to
train backpropagation networks with larger hidden lay-
ers (30 units instead of 3) to learn in between 136 and
406 epochs. Thus, there is some evidence of an advan-
tage for the additional constraints of model learning and
inhibitory competition in this task, given that the Leabra
networks generally learned much faster (and backprop-
agation required a much larger learning rate).
Now we can test the trained network to see how it has
solved the problem, and also to see how well it distin-
guishes grammatical from ungrammatical letter strings.
MonitorEnv Pattern: 0
Y
22.00
P->X_3->2
P->X_3->2
20.00
V->E_5->0
V->P_4->3
18.00
V->V_4->5
V->P_4->3
16.00
T->V_2->4
T->T_2->2
14.00
T->V_2->4
T->T_2->2
12.00
T->T_2->2
T->S_1->1
10.00
S->X_1->3
S->S_1->1
8.00
S->S_1->1
S->S_1->1
6.00
S->S_1->1
X->X_3->2
4.00
X->T_2->2
X->T_2->2
2.00
X->V_2->4
B->T_0->1
0.00
X
Do View , TEST_GRID_LOG to open a log to display
the test results. Then, do Test .
This will test the network with one sequence of let-
ters, with the results shown in the grid log on the right.
Note that the network display is being updated every
cycle, so you can see the stochastic choosing of one
of the two possible outputs. The network should be
producing the correct outputs, as indicated both by the
fsa_err column and by the fact that the Output pat-
tern matches the Target pattern, though it might make
an occasional mistake due to the noise.
To better understand the hidden unit representations,
we need a sequence of reasonable length (i.e., more than
ten or so events). In these longer sequences, the FSA
has revisited various nodes due to selecting the looping
path, and this revisiting will tell us about the represen-
tation of the individual nodes. Thus, if the total num-
ber of events in the sequence was below ten (events are
counted in the tick column of the grid log), we need
to keep Test ing to find a suitable sequence.
0.00
5.00
10.00
15.00
, !
Figure 6.15: Cluster plot of the FSA hidden unit represen-
tations for a long sequence. The labels for each node de-
scribe the current and next letter and the current and next node
(which the network is trying to predict). For example, T ! V
indicates that T was the letter input when the hidden state was
measured for the cluster plot, and the subsequent letter (which
does not affect the cluster plot) was V. Similarly, the asso-
ciated 2 ! 4 indicates that the node was 2 when the hidden
state was measured for the cluster plot, and the subsequent
node (which does not affect the cluster plot) was 4. The cur-
rent letter and node are relevant to evaluating the cluster plot,
whereas the next letter and node indicate what the network
was trained to predict. The letters are ambiguous (appearing
in multiple places in the grammar), but the nodes are not.
Question 6.4 Interpret the cluster plot you obtained
(especially the clusters with events at zero distance) in
terms of the correspondence between hidden states and
the current node versus the current letter. Remember
that current node and current letter information is re-
flected in the letter and number before the arrow.
To do so, turn the network Display toggle off (to
speed things up), and press Test again until you find
a sequence with ten or more events. After running the
sequence with ten or more events, press the Cluster
button on the fsa_ctrl control panel.
This will bring up a cluster plot of the hidden unit
states for each event (e.g., figure 6.15). Figure 6.15 pro-
vides a decoding of the cluster plot elements.
, !
Now, switch the test_env from TRAIN_ENV to
RANDOM_ENV ( Apply ). Then Test again.
This produces a random sequence of letters. Obvi-
ously, the network is not capable of predicting which
, !
 
Search WWH ::




Custom Search