Information Technology Reference
In-Depth Information
Table 19.8
Results for feature set III varying the number of hidden
layers on the classification (class) and regression task (score)
Class
Score
[%]
devel
test
devel
test
MLP (38-512-1)
82:5 83:8 82:2 83:2
DNN (38-512-512-1)
83
:
1
84
:
3
83
:
0
83
:
8
82:6 84:0 82:7 83:4
Challenge baseline (Schuller et al.
2012
)
79:1 80:8 81:6 82:6
Räsänen and Pohjalainen (
2013
) -
83:9
- -
Grèzes et al. (
2013
) -
83:1
- -
Shown are the best results obtained on the development set (devel)
and on the test set (test). The percentages reported denote UAR for
the classification task and CC for the regression task. For comparison
the baseline results and the highest published competition results of
the Conflict Sub-Challenge are shown as well
DNN (38-512-512-512-1)
The results reveal that using the conversational-prosodic feature set III we obtain
best results for a two-layer DNN with each hidden layer containing 512 ReLu units.
This value is smaller than the 1,024 hidden units per layer from the results above
and is due to the smaller number of features in feature set III.
On the classification task we achieve a UAR
D
84.3 %, which outperforms the
baseline result by 3.5 % and the best result in the Conflict Sub-Challenge reported
by Räsänen and Pohjalainen (
2013
) by 0.4 %. On the regression task the relative
improvements are smaller, still raising the benchmark of the Challenge correlation
coefficient 82.6-83.8 %, measured on the test set. To our knowledge, these numbers
represent the best results reported in the literature to date.
19.7
Conclusions
This study presents an approach for the detection of conflict during spontaneous,
multi-party conversations employing deep, hierarchical neural networks. The expe-
riments have been performed on the
SSPNet Conflict Corpus
(
SC
2
)(Kimetal.
2012
), which was also used in the Conflict Sub-Challenge of the Interspeech 2013
Computational Paralinguistics Challenge (Schuller et al.
2013
). Investigating differ-
ent feature sets we show that replacing the traditionally used sigmoid hidden units
with rectified linear units and pre-training the networks using RBMs—combined
with dropout as an advanced regularization method—improves performance and
allows us to obtain results almost as good as the already high baseline results
reported in the challenge.
We then show that the use of the oracle overlap ratio, i.e. the ratio of overlapping
speech to non-overlapping speech obtained from manual segmentation, as a single
feature already allows to predict the conflict level to a good degree. Combined
Search WWH ::
Custom Search