Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

0.5 to 10 with 0.5 steps, and the number of previous values L

}

was used. Table 6.8 only contains the best results for each network. Both

networks were evaluated using 2, 4 and 6 neurons.

We can see that while the average error value was around 4.5% for the

RTRL algorithm, the RTRL-ZED had errors around 1.5%. All results ob-

tained with RTRL-ZED are statistically significantly better (smaller error)

than those obtained with RTRL ( t -test p

∈{

8 , 10 , 12

0). It appears that only 2 neurons

are enough in both cases to learn the problem, but the RTRL-ZED seams to

benefit from an increase in number of neurons since the result for q =6had

an error of 1.4% which is smaller than the errors obtained with less neurons.

The second experiment is adapted from [2], and consists in predicting the

next symbol of the sequence: 01001000100001 ... ,up

to twenty zeros, always followed by a one. The number of symbols the network

needed to see in order to correctly make the remaining predictions until the

end of the sequence, was recorded. The sequence is composed of 230 symbols.

A hundred repetitions were made starting with random initialization of the

weights, with the learning rate, η , ranging from 5 to 39 for the standard

RTRL and from 2 to 9 for RTRL-ZED; the kernel bandwidth, h ,variedin

{

≈

and the size of the sliding window for the temporal estimation of the

density, L ,in

1 , 2 , 3

}

.

The results are shown in Fig. 6.13. Each point in these figures represents

the percentage of convergence in 100 experiments versus the correspondent

average number of symbols (NS) necessary to learn the problem, for the

standard RTRL (star) and the RTRL-ZED (square) networks. The various

points were obtained by changing the parameters η , L ,and h (in the case of

standard RTRL only η is used). Only the cases where at least one of the 100

repetitions converged were plotted.

The figures show that standard RTRL is not able to obtain more than 40%

convergence, but the RTRL-ZED can reach 100% convergence. It can also be

observed that, in general, for a given value of NS, the RTRL-ZED is able to

obtain higher percentages of convergence than the original RTRL. A slight

advantage of the original RTRL over the new proposal is that it is able to

learn the problem with fewer symbols but by a small difference.

{

8 , 10 , 12

}

6.2.2 Long Short-Term Memory

Typical RNN implementations suffer from the problem of loosing error in-

formation pertaining to long time lapses. This occurs because the error sig-

nals tend to vanish over time [95]. One of the most promising machines for

sequence learning, addressing this information loss issue, is the Long Short-

Term Memory (LSTM) recurrent neural network [103, 82, 81, 171]. In fact,

it has been shown that LSTM outperforms traditional RNNs such as El-

man [63], Back-Propagation Through Time (BPTT) [242] and Real-Time

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home