Digital Signal Processing Reference
In-Depth Information
information could, however, be added such as usage statistics [ 165 ]. Also, other
forms of representation of the lyrics can be considered, potentially integrating other
variants of on-line knowledge source integration.
The requirement to process all music was handled by establishing a gold standard
based on the (rounded) median to deal with cases of complete rater disagreement.
In addition, the effect of prototypicality was investigated by limitation to the test-
cases with clear rater agreement in different levels. UA and WA were raised from
roughly 60 % to around 70 % with this limitation in the three-class tasks of arousal and
valence classification. Confusions were mostly made between neighbouring classes,
thus increasing applicability. However, further improvements are needed for real-life
usage. In this respect the high differences between performance depending on the
individual raters have to be named indicating the subjective character of music mood.
Future efforts could consider other feature combination methods, such as individual
feature streams. Further, other dimensions could be added, such as 'dominance',
which is often used in speech emotion analysis [ 166 ]. These dimensions can also
be handled by regression approaches (cf. [ 162 , 167 ]). To that end, more labeller
tracks should ideally be added to approach genuine numeric continuity across the
dimensions. First results for a regression approach with the four raters on NTWICM
are reported in [ 33 ].
11.8 Singer Traits: Age, Gender, Height, Race
Extending the assessment of speaker traits to sung speech, and bridging from
assessing mood in music, one can also aim at the assessment of singers' traits.
This was first shown in [ 34 ], then refined in [ 35 ], and later extended for more traits
in [ 36 ].
Such singer trait classification, that is, automatically recognising meta data such
as age and gender of the performing vocalist(s) in recorded music, is currently still
an under-researched topic in MIR in contrast to the increasing efforts devoted to
that area in paralinguistic speech processing. Applications in music processing can
be found in categorisation and query of large databases with potentially unknown
artists—that is, artists for whom not enough reliable training data is available for
building singer identification models as, e.g., in [ 168 ]. Robustly extracting a variety
of meta information can then allow the artist to be identified in a large collection of
artist meta data. In addition, exploiting gender information can be useful for building
and adapting models for other MIR tasks such as automatic lyrics transcription
[ 169 ]. In comparison to speaker trait determination as was shown in Sect. 10.4.3 ,
recognition of singer traits can be expected to be an even more challenging task due to
high variability of the singer's pitch, instrumental accompaniment, and simultaneous
presence of multiple vocalists.
Little, if any, research dealt with the recognition of singer traits other than gender
in music. Apart from gender, three further tasks are thus investigated in the following:
 
Search WWH ::




Custom Search