Digital Signal Processing Reference
In-Depth Information
10.4.2.3 Performance
For provision of baseline results, the two pre-dominant architectures within the field
are considered: Firstly, dynamic modelling of LLD as pitch, energy, MFCC, etc.
by HMMs (only emotion). Secondly, static modelling using supra-segmental infor-
mation obtained by statistical functional application to the same LLD on the chunk
level. This is done either by classification for emotion or regression in the case of
interest.
It was decided to entirely rely on two standard publicly available tools widely
used in the community: the Hidden Markov Model Toolkit (HTK) 10 [ 172 ]inthe
case of dynamic modelling, and the WEKA 3 Data Mining Toolkit 11 [ 131 ]inthe
case of static modelling. This ensures easy reproducibility of the results and reduces
description of parameters to a minimum: Unless specified, defaults are used.
Constantly picking the majority class for the two-class emotion task of the 2009
Emotion Challenge would result in an accuracy (WA) of 70.1 %, which we consider
here, while the chance level for UA is simply 50 %, respectively. As instances are
unequally distributed among classes, balancing of the training material to avoid clas-
sifier over-fitting is considered. This can be eased by applying the Synthetic Minority
Oversampling TEchnique (SMOTE) [ 173 ] as data-driven up-sampling. Note that up-
sampling does not have any influence in the case of generative modelling: For each
class one HMM is trained individually and equal priors are assumed. Table 10.19
depicts these results for the two-class emotion task (classification by linear left-right
HMM, one model per emotion, diverse number of states, two Gaussian mixtures,
6
+
4 Baum-Welch re-estimation iterations, Viterbi) by UA and WA. With increased
temporal modelling, i.e., a higher state number, a gradual shift towards preference
of NEG is observed in the considered two-class problem case. In Table 10.20 results
for this 2-class emotion task are further shown employing the whole feature set and
using SVM (SMO learning, linear kernel, pairwise multi-class discrimination). For
SVM, an additional pre-processing step is performed: the features are standardised,
or z -normalised, i.e.,each feature is normalised to have zero mean and variance one.
Table 10.20 shows the influence of these two pre-processing steps and their impact
on the target evaluation measure UA. Note that the order of operations is crucial, as
the standardisation leads to different results if classes are balanced.
Table 10.21 then depicts the results for the interest baseline. The measures for
this task are the Pearson Correlation Coefficient (CC) and the mean linear error
Table 10.19 Baseline results
for 2-class emotion by
dynamic modelling with
HMM
#States
UA [%]
WA [%]
2-class
1
62.3
71.7
3
62.9
57.5
5
66.1
65.3
10
http://htk.eng.cam.ac.uk/docs/docs.shtml
11
http://www.cs.waikato.ac.nz/ml/weka/
 
Search WWH ::




Custom Search