Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

converge, at the same time avoiding a large computational cost while running

the experiments.

Regarding the nature of the problem, although it is a sequence prediction

task, we are entitled to regard it too as a classification task at least for

strings reaching the terminal node, since a decision can then be made as to

the successful parsing of the string.

In the mentioned work [6], a set with 500 strings from the Reber grammar

was used for training and a different set with 500 strings for testing. For

each LSTM topology and value of the parameter h the training-test process

was repeated 100 times, each time with random initialization of the network

weights. The strings were coded in a 1-out-of-7 coding, so the number of input

features and the number of output layer neurons were both 7. Two topologies

were tested: in the first case two memory blocks were used, one with one cell

and the other with two cells (Table 6.9); in the second case, both blocks had

two cells (Table 6.10). Tables 6.9 and 6.10 show the percentage of the trained

networks that were able to perfectly learn both the training and test sets,

and the average and standard deviation of the number of sequences that were

used for training. Both tables present results for learning rates ( η ) of 0.1, 0.2

and 0.3. The MMSE lines refers to the use of the original MMSE learning

algorithm. The results are discussed bellow.

Tabl e 6 . 9 Results for the experiments with the Reber grammar using the topology

(7:0:2(2,1):7). ANS stands for Average Number of Sequences necessary to converge.

η =0 . 1

η =0 . 2

η =0 . 3

3 ] % conv. ANS (std) [

3 ]%conv.

ANS (std) [

MMSE

15.1 (24.5)

74.9 (116.5)

61.0 (111.5)

MEE h=1.3

81.8 (115.4)

42.6 (51.5)

113.6 (135.6)

MEE h=1.4

45.6 (68.2)

70.6 (93.8)

61.8 (63.6)

MEE h=1.5

26.0 (43.9)

84.1 (120.2)

47.2 (39.1)

MEE h=1.6

28.4 (43.1)

58.0 (84.2)

135.1 (160.1)

MEE h=1.7

23.0 (25.9)

54.9 (87.8)

96.9 (135.8)

MEE h=1.8

75.8 (51.8)

60.1 (96.8)

66.0 (111.8)

MEE h=1.9

78.0 (110.1)

53.6 (94.1)

48.7 (66.2)

MEE h=2.0

49.3 (77.6)

57.6 (109.0)

57.4 (83.7)

The second set of experiments used strings from the embedded Reber

Grammar (ERG) generated as shown in Fig. 6.18. This grammar produces

two types of strings: BT < Reber string >TE and BP < Reber string >PE .

In order to recognize these strings, the learning machine has to be able

to distinguish them from strings such as BP < Reber string >TE and

BT <Reber string> PE . To do this it is essential to remember the second

symbol in the sequence such that it can be compared with the second last

symbol. Notice that the length of the sequence can be arbitrarily large. This

problem is no longer learnable by an Elman net and a RTRL net only learns

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home