Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence - page 412

Information Technology Reference

In-Depth Information

Table 19.7 Results for the predicted overlap ratio as a single

feature (top) and alongside baseline feature set I (bottom) varying

the number of hidden layers on the classification (class) and

regression task (score)

Class

Score

[%]

devel

test

devel

test

80:5 82:3 81:3 82:1

DNN (1-32-32-1) 80 : 8 82 : 9 81 : 9 82 : 7

DNN (1-32-32-32-1) 80:5 82:5 81:2 81:9

MLP (6374-2048-1) 82:0 83:2 82:0 82:6

DNN (6374-1024-1024-1) 82 : 3 83 : 7 82 : 5 83 : 2

DNN (6374-1024-1024-1024-1) 82:0 83:1 82:1 82:6

Shown are the best results obtained on the development set

(devel) and on the test set (test). The percentages reported denote

UAR for the classification task and CC for the regression task

MLP (1-32-1)

trained with dropout, yielded the best results. Alongside feature set I, again a DNN

with two hidden layers, but H hid D 1;024 , was found to be optimal.

Our finding is that using a prediction of overlapping speech as the sole feature

of a DNN classifier produces models with better performance than the baseline

experiment, confirming the results in Grèzes et al. ( 2013 ). Even for the DNN

regressor we obtained slightly higher cross-correlation results. When added to

feature set I the predicted overlap ratio further improves results on both tasks.

19.6.4

Conversational-Prosodic Features

Encouraged by the high impact of the overlap ratio feature we drew inspiration

from Kim et al. ( 2012 ) and investigated the performance of DNNs on feature

set III. As described in Sect. 19.6.4 this feature set contains prosodic as well as

conversational features, including the overlap ratio examined in the previous section

as well as speaker-turn based features. In order to facilitate the experiments we

computed the speaker-turn features from the manual annotation provided with the

data set. However, as it gave better results, we used the predicted overlap ratio

instead of the oracle overlap ratio, as described in Sect. 19.6.3 .

As before we trained a number of ReLu networks with dropout for different

network topologies, varying the number of hidden units as well as the number of

layers. All networks were trained on the training set with SGD and momentum,

stopping training as soon as the cost function started to raise on the development

set. Again, we used CE as the cost function for the classification task and MSE for

the regression task.

The results for the best performing network varying the number of hidden layers

are shown in Table 19.8 .

Next Page

Conflict and Multimodal Communication: Social Research and Machine Intelligence

Search WWH ::

Custom Search

Home