Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

where Q stands for a constant term and a ik is

a ik = t i −

y i −

t k + y k = e i −

e k .

6.2.2.3

Experiments

In this section several experiments are presented that compare the perfor-

mance of LSTM learning with MMSE versus MEE (using R 2 EE). Three

standard datasets are used for this purpose: the Reber grammar problem,

the embedded Reber grammar and the A n B n grammar (for an introduction

to formal languages see [91]).

The finite-state machine in Fig. 6.17 generates strings from the grammar

known as the Reber grammar. The strings are generated by starting at B ,

S

X

S

T

B

E

X

P

V

T

Fig. 6.17

A finite-state machine for the Reber grammar.

and moving from node to node, adding the symbols in the arcs to the string.

The symbol E marks the end of a string. When there are two arcs leaving a

node, one is chosen randomly with equal probability. This process generates

strings of arbitrary length. Several experiments were conducted whose goal

was the prediction of the next valid symbol of a string, after the presentation

of a given symbol. For instance, if the network receives a starting symbol B

it has to predict that the possible next symbols are P or T . If the network is

able to correctly predict at any step all possible symbols of all strings gener-

ated by the grammar, in the training and test sets, using less than 250 000

sequences for learning, it is considered that the learning has converged. This

number is equal to the number of sequences in the dataset times the number

of training epochs. So, in our case, since the datasets used have 500 sequences,

the number 250 000 is reached with 500 epochs. This number 250 000 was

used in [6] to allow a number of sequences large enough for most networks to

Search WWH ::

Custom Search

Home