Information Technology Reference
In-Depth Information
Table 19.5 Results on the classification (class) and regression
task (score) using feature set I varying the number of hidden layers
Class Score
[%] devel test devel test
MLP (6373-2048-1) 78:3 79:8 80:9 81:5
DNN (6373-1024-1024-1) 79:7 80:9 81:5 82:1
DNN (6373-1024-1024-1024-1) 78:9 80:2 81:1 81:8
Shown are the best results obtained on the development set (devel)
and on the test set (test). The percentages reported denote UAR for
the classification task and CC for the regression task
units were employed. Sigmoid units seem to profit more from pre-training than
ReLus, which again may be attributed to the effect of dropout. Pre-training further
seems to more helpful for bigger hidden layer sizes than for smaller ones. This
observation can be explained by the regularization effect of pre-training, which has
a bigger impact on big networks with their high number of parameters. Third, due to
the adoption of dropout, ReLu networks show best performance for bigger network
sizes than sigmoid networks. This effect was already explained in Sect. 19.2.4 .Last
but not least, even though the performance of the best network of CC D 81:5 %
indicates rather good modelling power, it still underperforms the baseline system
(CC D 82:6 %).
Based on these insights we trained a number of DNNs varying the number of
hidden layers and using the training procedure described above. In particular, we
used ReLus trained adopting the dropout technique. The best results are reported in
Table 19.5 . For this feature set and setup we achieve best results for a DNN with
two hidden layers with 1,024 hidden units for each layer.
19.6.3
Overlap Ratio
As outlined in the introduction a number of studies have indicated that overlap
and features derived from it serve as very good indicators for the conflict level
of discourse. Hence, we were interested in how this feature can be leveraged by
deep, hierarchical networks. The SC 2 corpus is equipped with hand-labeled meta-
data containing speaker-turn information as well as overlapping speech segment
annotation. This allows us to compute the true overlap ratio, i.e. the relative
percentage of overlap with respect to the utterance length; in the following, we refer
to this reference value as the oracle overlap ratio .
We first examined this ratio as a single feature and trained several feed-forward
neural networks for 1-3 hidden layers and varying the number of ReLu hidden units
from 2 to 128. As before we trained the networks using SGD on the training set
adopting dropout and early stopping.
 
Search WWH ::




Custom Search