Digital Signal Processing Reference
In-Depth Information
Fig. 10.7 Speaker height
distribution in TIMIT's train
and test partitions by number
of instances and speakers
(speaker number is shown by
the same bars, but the value
has to be divided by ten, as
each speaker spoke ten turns)
Te st
Train
500
0
144
164
184
204
speaker height [cm]
providing speaker height in the units of feet and inches. For better comparability,
though, it was decided to follow the conversion to the SI unit of meters following
the result presentation as given in [ 178 ].
TIMIT also has a definition of train (462 speakers) and test (168 speakers) parti-
tions to which we stick in the oncoming experiments.
10.4.3.3 Methodology
Due to the size of the aGender corpus, a limited feature set was provided in the
Challenge consisting of 450 features This is reached by reducing the number of
descriptors from 38 to 29 and that of functionals from 21 to 8 [ 75 , 179 ]. For height
determination, the full set is used.
The Weka toolkit is used [ 196 ] for classification and regression. SVM are preferred
for age and gender classification experiments; the general Support Vector paradigm
further offers SVR for the continuous ordinal task of height. For their training SMO
is employed. As kernel function a linear kernel was found optimal in experiments
on training exclusively over the different tasks. A kernel complexity of 1 and 0.05 is
chosen for classification and regression, respectively. In the case of speaker height
determination, additional cases are considered to demonstrate the mutual dependency
of speaker traits. To this end, ground truth information on other speaker traits is
added as feature information to the acoustic vector in different variations. The use of
ground truth information is intentional to show the upper benchmark effect of mutual
dependence.
10.4.3.4 Performance
Table 10.23 shows results for the age and gender baselines by UA and WA. Visibly,
the 'blind' Test partition shows better results, likely due to the now larger training
set. Interestingly, in several cases a 7-group sub-model, separating age groups for
gender recognition and vice versa, performs slightly better than direct modelling for
the UA. This can be seen as first indication of mutual task dependence.
 
Search WWH ::




Custom Search