Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence - page 413

Information Technology Reference

In-Depth Information

Table 19.8 Results for feature set III varying the number of hidden

layers on the classification (class) and regression task (score)

Class

Score

[%]

devel

test

devel

test

MLP (38-512-1)

82:5 83:8 82:2 83:2

DNN (38-512-512-1)

83 : 1

84 : 3

83 : 0

83 : 8

82:6 84:0 82:7 83:4

Challenge baseline (Schuller et al. 2012 ) 79:1 80:8 81:6 82:6

Räsänen and Pohjalainen ( 2013 ) - 83:9 - -

Grèzes et al. ( 2013 ) - 83:1 - -

Shown are the best results obtained on the development set (devel)

and on the test set (test). The percentages reported denote UAR for

the classification task and CC for the regression task. For comparison

the baseline results and the highest published competition results of

the Conflict Sub-Challenge are shown as well

DNN (38-512-512-512-1)

The results reveal that using the conversational-prosodic feature set III we obtain

best results for a two-layer DNN with each hidden layer containing 512 ReLu units.

This value is smaller than the 1,024 hidden units per layer from the results above

and is due to the smaller number of features in feature set III.

On the classification task we achieve a UAR D 84.3 %, which outperforms the

baseline result by 3.5 % and the best result in the Conflict Sub-Challenge reported

by Räsänen and Pohjalainen ( 2013 ) by 0.4 %. On the regression task the relative

improvements are smaller, still raising the benchmark of the Challenge correlation

coefficient 82.6-83.8 %, measured on the test set. To our knowledge, these numbers

represent the best results reported in the literature to date.

19.7

Conclusions

This study presents an approach for the detection of conflict during spontaneous,

multi-party conversations employing deep, hierarchical neural networks. The expe-

riments have been performed on the SSPNet Conflict Corpus ( SC 2 )(Kimetal.

2012 ), which was also used in the Conflict Sub-Challenge of the Interspeech 2013

Computational Paralinguistics Challenge (Schuller et al. 2013 ). Investigating differ-

ent feature sets we show that replacing the traditionally used sigmoid hidden units

with rectified linear units and pre-training the networks using RBMs—combined

with dropout as an advanced regularization method—improves performance and

allows us to obtain results almost as good as the already high baseline results

reported in the challenge.

We then show that the use of the oracle overlap ratio, i.e. the ratio of overlapping

speech to non-overlapping speech obtained from manual segmentation, as a single

feature already allows to predict the conflict level to a good degree. Combined

Next Page

Conflict and Multimodal Communication: Social Research and Machine Intelligence

Search WWH ::

Custom Search

Home