Information Technology Reference
In-Depth Information
simple conditions a perceptron is equivalent to an SVM, and moreover that
the early stopping rule used in stochastic gradient descent training of MLPs
is a regularization method that constraints the norm of the weight vector,
and therefore its generalization ability.
Experimental evidence on the comparison of MLPs and SVMs is provided
in [7]. The same datasets and methodology of [220] was followed. The MLPs
used the MSE, CE, EXP and SEE risk functionals. The SVMs were imple-
mented with a radial basis function (RBF) kernel [46].
The same battery of statistical tests of [220] was applied to the experimen-
tal results showing no significant difference among the classifiers in terms of
unbalanced error rates. In terms of balanced error rates SVM-RBF performed
significantly worse than MLP-CE and MLP-EXP. Regarding generalization,
SVM-RBF and MLP-EXP scored as the classification methods with signifi-
cantly better generalization, both in terms of balanced and unbalanced error
rates. Thus, even in terms of generalization SVMs had worthy MLP competi-
tors (in the studied 35 datasets).
6.2 Recurrent Neural Networks
Recurrent neural networks (RNNs), as opposed to feed-forward implemen-
tations, allow information to pass from a layer into itself or into a previous
layer. This recurrent behavior implies a feedback that makes feed-forward
learning algorithms, such as back-propagation, unfit for these networks. Back-
propagation is generalized for RNNs in the form of back-propagation through
time (BPTT) [240]. Another important learning method for RNNs is real time
recurrent learning (RTRL) [241] which is discussed in a following section.
The recurrent nature raises the issue of stability [96] which manifests itself
during training: not all networks with identical topology are able to learn.
Depending on the initialization of their weights, some might not converge
during training.
The main use of RNNs is for time dependent tasks, such as learning sym-
bolic sequences or making time series predictions although they can also be
used as associative memories.
The following two sections focus on the application of MEE to two types of
RNNs, showing empirically that it can improve stability and, in some cases,
the overall performance when compared to MMSE-based algorithms.
 
Search WWH ::




Custom Search