Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence

Information Technology Reference

In-Depth Information

When a ReLu is activated above 0, its partial derivative is 1. Thus vanishing

gradients do not exist along paths of active hidden units in an arbitrarily deep

network. Additionally, ReLus saturate at exactly 0, which is potentially helpful

when using hidden activations as input features for a classifier.

ReLus recently have been shown to yield state-of-the-art results on a number of

tasks in speech recognition, for example on large vocabulary tasks, achieving lower

word error rates than using a logistic network with the same topology (Zeiler et al.

2013 ; Maas et al. 2013 ; Dahl et al. 2013 ). To our knowledge they have not yet been

applied in paralinguistics research.

19.3

Recurrent Neural Networks

A RNN is a class of neural networks whose connections between units form a

directed cycle. This creates an internal state of the network, so that the network

exhibits a dynamic temporal behavior and allows RNNs to process arbitrary

sequences of inputs, unlike feed-forward neural networks. More precisely, given

an input sequence x D x.1/;:::;x.T/ with x.t/ 2 R D , D being the dimen-

sionality of vector x.t/ , a standard RNN computes the sequence of hidden vectors

h D h.1/;:::;h.T/ and output vectors o D o.1/;:::;o.T/ by recursively

evaluating the following equations from time steps t D 1;:::;T :

h.t/ D g h W hx x.t/C W hh h.t 1/ C b h

(19.14)

o.t/ D g o W oh h.t/ C b o

(19.15)

where W hx denotes the weight matrix from the input to the hidden layer, W hh the

weight matrix connecting the hidden units with each other, W oh the hidden to output

weight matrix, and b h and b o the bias vectors of the hidden and the output layer,

respectively. Further, g h and g o are the activation functions of the hidden layer and

output layer, respectively, commonly chosen to be the sigmoid or tanh function.

19.3.1

Long Short-Term Memory

RNNs are able to model a certain amount of context by using cyclic connections

and can, in principle, map the entire history of previous inputs to each output.

However, an analysis of the error flow in conventional recurrent neural nets reveals

that they tend to suffer from the vanishing gradient problem (Hochreiter et al.

2001 ), i.e. the backpropagated error needed for training the network parameters

either blows up or decays over time. This effect essentially limits the access to

long time lags. Various attempts have been made in the past to solve this problem,

Search WWH ::

Custom Search

Home