Information Technology Reference
In-Depth Information
Table 19.7 Results for the predicted overlap ratio as a single
feature (top) and alongside baseline feature set I (bottom) varying
the number of hidden layers on the classification (class) and
regression task (score)
Class
Score
[%]
devel
test
devel
test
80:5 82:3 81:3 82:1
DNN (1-32-32-1) 80 : 8 82 : 9 81 : 9 82 : 7
DNN (1-32-32-32-1) 80:5 82:5 81:2 81:9
MLP (6374-2048-1) 82:0 83:2 82:0 82:6
DNN (6374-1024-1024-1) 82 : 3 83 : 7 82 : 5 83 : 2
DNN (6374-1024-1024-1024-1) 82:0 83:1 82:1 82:6
Shown are the best results obtained on the development set
(devel) and on the test set (test). The percentages reported denote
UAR for the classification task and CC for the regression task
MLP (1-32-1)
trained with dropout, yielded the best results. Alongside feature set I, again a DNN
with two hidden layers, but H hid D 1;024 , was found to be optimal.
Our finding is that using a prediction of overlapping speech as the sole feature
of a DNN classifier produces models with better performance than the baseline
experiment, confirming the results in Grèzes et al. ( 2013 ). Even for the DNN
regressor we obtained slightly higher cross-correlation results. When added to
feature set I the predicted overlap ratio further improves results on both tasks.
19.6.4
Conversational-Prosodic Features
Encouraged by the high impact of the overlap ratio feature we drew inspiration
from Kim et al. ( 2012 ) and investigated the performance of DNNs on feature
set III. As described in Sect. 19.6.4 this feature set contains prosodic as well as
conversational features, including the overlap ratio examined in the previous section
as well as speaker-turn based features. In order to facilitate the experiments we
computed the speaker-turn features from the manual annotation provided with the
data set. However, as it gave better results, we used the predicted overlap ratio
instead of the oracle overlap ratio, as described in Sect. 19.6.3 .
As before we trained a number of ReLu networks with dropout for different
network topologies, varying the number of hidden units as well as the number of
layers. All networks were trained on the training set with SGD and momentum,
stopping training as soon as the cost function started to raise on the development
set. Again, we used CE as the cost function for the classification task and MSE for
the regression task.
The results for the best performing network varying the number of hidden layers
are shown in Table 19.8 .
 
Search WWH ::




Custom Search