Information Technology Reference
In-Depth Information
converge, at the same time avoiding a large computational cost while running
the experiments.
Regarding the nature of the problem, although it is a sequence prediction
task, we are entitled to regard it too as a classification task at least for
strings reaching the terminal node, since a decision can then be made as to
the successful parsing of the string.
In the mentioned work [6], a set with 500 strings from the Reber grammar
was used for training and a different set with 500 strings for testing. For
each LSTM topology and value of the parameter h the training-test process
was repeated 100 times, each time with random initialization of the network
weights. The strings were coded in a 1-out-of-7 coding, so the number of input
features and the number of output layer neurons were both 7. Two topologies
were tested: in the first case two memory blocks were used, one with one cell
and the other with two cells (Table 6.9); in the second case, both blocks had
two cells (Table 6.10). Tables 6.9 and 6.10 show the percentage of the trained
networks that were able to perfectly learn both the training and test sets,
and the average and standard deviation of the number of sequences that were
used for training. Both tables present results for learning rates ( η ) of 0.1, 0.2
and 0.3. The MMSE lines refers to the use of the original MMSE learning
algorithm. The results are discussed bellow.
Tabl e 6 . 9 Results for the experiments with the Reber grammar using the topology
(7:0:2(2,1):7). ANS stands for Average Number of Sequences necessary to converge.
η =0 . 1
η =0 . 2
η =0 . 3
3 ] % conv. ANS (std) [
3 ] % conv. ANS (std) [
3 ]%conv.
ANS (std) [
10
10
10
MMSE
15.1 (24.5)
38
74.9 (116.5)
63
61.0 (111.5)
56
MEE h=1.3
81.8 (115.4)
36
42.6 (51.5)
11
113.6 (135.6)
7
MEE h=1.4
45.6 (68.2)
45
70.6 (93.8)
11
61.8 (63.6)
10
MEE h=1.5
26.0 (43.9)
54
84.1 (120.2)
29
47.2 (39.1)
13
MEE h=1.6
28.4 (43.1)
66
58.0 (84.2)
37
135.1 (160.1)
15
MEE h=1.7
23.0 (25.9)
64
54.9 (87.8)
40
96.9 (135.8)
30
MEE h=1.8
75.8 (51.8)
30
60.1 (96.8)
50
66.0 (111.8)
33
MEE h=1.9
78.0 (110.1)
62
53.6 (94.1)
61
48.7 (66.2)
33
MEE h=2.0
49.3 (77.6)
58
57.6 (109.0)
67
57.4 (83.7)
51
The second set of experiments used strings from the embedded Reber
Grammar (ERG) generated as shown in Fig. 6.18. This grammar produces
two types of strings: BT < Reber string >TE and BP < Reber string >PE .
In order to recognize these strings, the learning machine has to be able
to distinguish them from strings such as BP < Reber string >TE and
BT <Reber string> PE . To do this it is essential to remember the second
symbol in the sequence such that it can be compared with the second last
symbol. Notice that the length of the sequence can be arbitrarily large. This
problem is no longer learnable by an Elman net and a RTRL net only learns
 
Search WWH ::




Custom Search