Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence

Information Technology Reference

In-Depth Information

Table 19.5 Results on the classification (class) and regression

task (score) using feature set I varying the number of hidden layers

Class Score

[%] devel test devel test

MLP (6373-2048-1) 78:3 79:8 80:9 81:5

DNN (6373-1024-1024-1) 79:7 80:9 81:5 82:1

DNN (6373-1024-1024-1024-1) 78:9 80:2 81:1 81:8

Shown are the best results obtained on the development set (devel)

and on the test set (test). The percentages reported denote UAR for

the classification task and CC for the regression task

units were employed. Sigmoid units seem to profit more from pre-training than

ReLus, which again may be attributed to the effect of dropout. Pre-training further

seems to more helpful for bigger hidden layer sizes than for smaller ones. This

observation can be explained by the regularization effect of pre-training, which has

a bigger impact on big networks with their high number of parameters. Third, due to

the adoption of dropout, ReLu networks show best performance for bigger network

sizes than sigmoid networks. This effect was already explained in Sect. 19.2.4 .Last

but not least, even though the performance of the best network of CC D 81:5 %

indicates rather good modelling power, it still underperforms the baseline system

(CC D 82:6 %).

Based on these insights we trained a number of DNNs varying the number of

hidden layers and using the training procedure described above. In particular, we

used ReLus trained adopting the dropout technique. The best results are reported in

Table 19.5 . For this feature set and setup we achieve best results for a DNN with

two hidden layers with 1,024 hidden units for each layer.

19.6.3

Overlap Ratio

As outlined in the introduction a number of studies have indicated that overlap

and features derived from it serve as very good indicators for the conflict level

of discourse. Hence, we were interested in how this feature can be leveraged by

deep, hierarchical networks. The SC 2 corpus is equipped with hand-labeled meta-

data containing speaker-turn information as well as overlapping speech segment

annotation. This allows us to compute the true overlap ratio, i.e. the relative

percentage of overlap with respect to the utterance length; in the following, we refer

to this reference value as the oracle overlap ratio .

We first examined this ratio as a single feature and trained several feed-forward

neural networks for 1-3 hidden layers and varying the number of ReLu hidden units

from 2 to 128. As before we trained the networks using SGD on the training set

adopting dropout and early stopping.

Search WWH ::

Custom Search

Home