Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech - Conflict and Multimodal Communication: Social Research and Machine Intelligence

Information Technology Reference

In-Depth Information

19.3.3

BLSTM as Overlap Prediction Generators

In the introduction we have outlined that overlap is a very informative feature for

conflict level prediction and we will confirm this observation in Sect. 19.6.3 . In real-

world applications manual speech overlap annotations are not available and thus

must be reliably estimated from the speech signal itself. We extend an approach

presented in Geiger et al. ( 2013 ) by using a BLSTM model as a non-linear classifier

to generate frame-wise overlap predictions. To this end, we feed the input feature

vector

X D Œx.1/;:::;x.T/

(19.21)

into the network, where T is the total number of frames in the audio sequence, and

obtain an output y.t/ at the sigmoid output layer for each time step t . Due to the

BLSTM nature of our network the output y.t/ is dependent on both past and future

input, up to time t :

y.t/ D g f x.1/;:::;x.t/ C g b x.T/;:::;x.t/ ;

(19.22)

where g f and g b denote the function computed by the forward and backward part

of the BLSTM, respectively.

For training the network, the targets are defined as

1;

ifx . t / 2 overlap

y.t/ D

(19.23)

0;

else

As in Geiger et al. ( 2013 ) the predictions y.t/ of the trained network are used for

classification by adopting the threshold as follows:

1;

ify . t / ™

c.t/ D

(19.24)

0;

ify . t /<™

The threshold can be varied in order to select a specific operating point with a

different trade-off between precision and recall.

19.4

Database

The experiments and results presented in this study are based on the SSPNet Conflict

Corpus ( SC 2 )(Kimetal. 2012 ), which was also used in the Conflict Sub-Challenge

of the Interspeech 2013 Computational Paralinguistics Challenge (Schuller et al.

2013 ). It contains 1,430 clips, each 30 s long, extracted from the Canal9 Cor-

pus (Vinciarelli et al. 2009 ), a publicly available corpus of broadcasted Swiss

Conflict and Multimodal Communication: Social Research and Machine Intelligence

Search WWH ::

Custom Search

Home