Information Technology Reference
In-Depth Information
When a ReLu is activated above 0, its partial derivative is 1. Thus vanishing
gradients do not exist along paths of active hidden units in an arbitrarily deep
network. Additionally, ReLus saturate at exactly 0, which is potentially helpful
when using hidden activations as input features for a classifier.
ReLus recently have been shown to yield state-of-the-art results on a number of
tasks in speech recognition, for example on large vocabulary tasks, achieving lower
word error rates than using a logistic network with the same topology (Zeiler et al.
2013 ; Maas et al. 2013 ; Dahl et al. 2013 ). To our knowledge they have not yet been
applied in paralinguistics research.
19.3
Recurrent Neural Networks
A RNN is a class of neural networks whose connections between units form a
directed cycle. This creates an internal state of the network, so that the network
exhibits a dynamic temporal behavior and allows RNNs to process arbitrary
sequences of inputs, unlike feed-forward neural networks. More precisely, given
an input sequence x D x.1/;:::;x.T/ with x.t/ 2 R D , D being the dimen-
sionality of vector x.t/ , a standard RNN computes the sequence of hidden vectors
h D h.1/;:::;h.T/ and output vectors o D o.1/;:::;o.T/ by recursively
evaluating the following equations from time steps t D 1;:::;T :
h.t/ D g h W hx x.t/C W hh h.t 1/ C b h
(19.14)
o.t/ D g o W oh h.t/ C b o
(19.15)
where W hx denotes the weight matrix from the input to the hidden layer, W hh the
weight matrix connecting the hidden units with each other, W oh the hidden to output
weight matrix, and b h and b o the bias vectors of the hidden and the output layer,
respectively. Further, g h and g o are the activation functions of the hidden layer and
output layer, respectively, commonly chosen to be the sigmoid or tanh function.
19.3.1
Long Short-Term Memory
RNNs are able to model a certain amount of context by using cyclic connections
and can, in principle, map the entire history of previous inputs to each output.
However, an analysis of the error flow in conventional recurrent neural nets reveals
that they tend to suffer from the vanishing gradient problem (Hochreiter et al.
2001 ), i.e. the backpropagated error needed for training the network parameters
either blows up or decays over time. This effect essentially limits the access to
long time lags. Various attempts have been made in the past to solve this problem,
Search WWH ::




Custom Search