Digital Signal Processing Reference
In-Depth Information
age, height, and race. 9 To this end, we will also show how to improve the extraction
of the leading voice beyond the harmonic enhancement, i.e., filtering of the drum
accompaniment, as was shown in Sect. 11.1 .
11.8.1 UltraStar Singer Traits Database
To test such automatic singer-independent classification, the UltraStar database, as
was first introduced in [ 34 ], was enriched with according detailed annotation of
singer traits, particularly continuous age and gender. The database contains 581
songs corresponding to over 37 h total play time commonly used for the 'UltraStar'
karaoke game. The focus on highly popular artists was needed for the establishment of
solid ground truth as information on these can be retrieved with sufficient certainty.
To ensure transparent partitioning and singer independence, the first letter of the
name of the performer is used for assignment to training, development, and test sets.
The UltraStar meta-data provides ground truth tempo and lyrics aligned to beats.
The singer(s) identity was annotated at beat level wherever possible. In the case of
more than one singer per song the 'singer diarisation'—i.e., the alignment of singer
identity to the music—was manually determined with the help of the corresponding
official music video for precise results. Subsequent to this step, gender, height, birth
year, and race of the 516 distinct singers was collected and repeatedly verified from
on-line textual (IMDB, 10 and Wikipedia 11 ) and audiovisual (YouTube 12 ) knowledge
sources. The two male raters (24 and 28 years old) were experts for popular music.
In fact, a considerable amount of the contained songs has two or more singers
present simultaneously. To ensure realistic 'non-preselected' analysis, the following
scheme was derived in such a case: In case of the nominal traits gender and race,
beats were marked as 'unknown' except if all simultaneously present singers share the
same attribute value. In case of the continuous-valued traits age and height, the mean
over present singers was used. In the same way, musical pieces were treated where
an exact singer diarisation could not be reached. Finally, beats were also marked as
'unknown' if an attribute was missing for at least one of the present singers.
Figure 11.20 a,b visualise the obtained distribution of gender and race among the
516 singers. In Fig. 11.20 c,d the continuous-valued age and height are shown with
9 The annotation scheme is inspired by the TIMIT corpus as was used in Sect. 10.4.3 . As such,
the term 'race' is adopted from the corpus' meta-information—though modern biology often nei-
ther classifies the homo sapiens sapiens by race nor sub-categories for collective differentiation in
both physical and behavioural traits. Opposing current molecular biologic and population genetic
research's view that a systematic categorisation may be insufficient to describe the enormous diver-
sity and fluent differences between geographic population, it can be argued that, when aiming at an
end-user information retrieval system, a categorisation into illustrative, archetypal categories can
be useful.
10
http://www.imdb.com
11
http://www.wikipedia.org
12
http://www.youtube.com
 
Search WWH ::




Custom Search