Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

simple conditions a perceptron is equivalent to an SVM, and moreover that

the early stopping rule used in stochastic gradient descent training of MLPs

is a regularization method that constraints the norm of the weight vector,

and therefore its generalization ability.

Experimental evidence on the comparison of MLPs and SVMs is provided

in [7]. The same datasets and methodology of [220] was followed. The MLPs

used the MSE, CE, EXP and SEE risk functionals. The SVMs were imple-

mented with a radial basis function (RBF) kernel [46].

The same battery of statistical tests of [220] was applied to the experimen-

tal results showing no significant difference among the classifiers in terms of

unbalanced error rates. In terms of balanced error rates SVM-RBF performed

significantly worse than MLP-CE and MLP-EXP. Regarding generalization,

SVM-RBF and MLP-EXP scored as the classification methods with signifi-

cantly better generalization, both in terms of balanced and unbalanced error

rates. Thus, even in terms of generalization SVMs had worthy MLP competi-

tors (in the studied 35 datasets).

6.2 Recurrent Neural Networks

Recurrent neural networks (RNNs), as opposed to feed-forward implemen-

tations, allow information to pass from a layer into itself or into a previous

layer. This recurrent behavior implies a feedback that makes feed-forward

learning algorithms, such as back-propagation, unfit for these networks. Back-

propagation is generalized for RNNs in the form of back-propagation through

time (BPTT) [240]. Another important learning method for RNNs is real time

recurrent learning (RTRL) [241] which is discussed in a following section.

The recurrent nature raises the issue of stability [96] which manifests itself

during training: not all networks with identical topology are able to learn.

Depending on the initialization of their weights, some might not converge

during training.

The main use of RNNs is for time dependent tasks, such as learning sym-

bolic sequences or making time series predictions although they can also be

used as associative memories.

The following two sections focus on the application of MEE to two types of

RNNs, showing empirically that it can improve stability and, in some cases,

the overall performance when compared to MMSE-based algorithms.

Search WWH ::

Custom Search

Home