Information Technology Reference
In-Depth Information
where Q stands for a constant term and a ik is
a ik = t i
y i
t k + y k = e i
e k .
6.2.2.3
Experiments
In this section several experiments are presented that compare the perfor-
mance of LSTM learning with MMSE versus MEE (using R 2 EE). Three
standard datasets are used for this purpose: the Reber grammar problem,
the embedded Reber grammar and the A n B n grammar (for an introduction
to formal languages see [91]).
The finite-state machine in Fig. 6.17 generates strings from the grammar
known as the Reber grammar. The strings are generated by starting at B ,
S
X
S
T
B
E
X
P
P
V
V
T
Fig. 6.17
A finite-state machine for the Reber grammar.
and moving from node to node, adding the symbols in the arcs to the string.
The symbol E marks the end of a string. When there are two arcs leaving a
node, one is chosen randomly with equal probability. This process generates
strings of arbitrary length. Several experiments were conducted whose goal
was the prediction of the next valid symbol of a string, after the presentation
of a given symbol. For instance, if the network receives a starting symbol B
it has to predict that the possible next symbols are P or T . If the network is
able to correctly predict at any step all possible symbols of all strings gener-
ated by the grammar, in the training and test sets, using less than 250 000
sequences for learning, it is considered that the learning has converged. This
number is equal to the number of sequences in the dataset times the number
of training epochs. So, in our case, since the datasets used have 500 sequences,
the number 250 000 is reached with 500 epochs. This number 250 000 was
used in [6] to allow a number of sequences large enough for most networks to
 
Search WWH ::




Custom Search