Digital Signal Processing Reference
In-Depth Information
(a)
(b)
(c)
(d)
10
20
30
40
50
60
150
160
170
180
190
Fig. 11.20 UltraStar Singer Trait Database's distribution of traits among its 516 contained singers
[ 36 ]. a Gender, b Race, c Age [years], d Height [cm]
boxes ranging from the first to the third quartile and values exceeding this range by
more than a factor of 1.5 shown as outliers by circles. The fact that singer age is a
function of a musical piece's recording date was taken into account.
For automatic assessment, the tasks were constrained to binary and ternary clas-
sification tasks on frame (beat) level as well as on song level. This decision needed to
be made owing to the challenging real-world conditions given when assessing singer
traits in polyphonic music. Such binary classification provides a simple categorisa-
tion per singer trait, and ternary classification is carried out to perform simultaneous
singing activity detection on frame level in order to provide full realism. Height and
age were discretised to 'small' (s,
<
175 cm) and 'tall' (t,
175 cm), respectively
'young' (y,
30 years). From the annotated race classes the
sparse classes 'Asian', 'Black', and 'Hispanic' were clustered as opposed to 'White'
singers.
The number of beats for task evaluation are shown in Table 11.27 . The annotation
is available for reproduction of results. 13
<
30 years) and 'old' (o,
11.8.2 Methodology
Given the challenging condition of person trait recognition under singing in poly-
phonic music, finding the optimal preprocessing by suited singer separation becomes
a focus issue. To this end, harmonic enhancement as was shown in Sect. 11.1 basing
on openBliSSART (cf. Sect. 11.8 is used as a first means. This will now be followed
by targeted extraction of the leading voice as in [ 170 ]. Different sets of NMF compo-
nents shall be used in different parts of a song for higher flexibility of the algorithm.
A song is therefore chunked into frame-synchronous non-overlapping chunks of
881 664 samples (
20 s at 44.1 kHz sample rate) as in [ 35 ]. Then, the leading voice
13
http://www.openaudio.eu/UltraStar_Singers.arff
 
Search WWH ::




Custom Search