Audio Data - Intelligent Audio Analysis - page 27

Digital Signal Processing Reference

In-Depth Information

Fig. 5.2 Dimensional mood model development: multidimensional scaling of emotion-related tags

as by Russell ( left ) and Thayer's model with four mood clusters ( right )[ 14 ]

Fig. 5.3 Dimensional mood

model with five discrete values

for arousal and valence [ 14 ]

decided in favour of a large database where changes in mood during a song are

'averaged out' in the annotation process, i.e., assignment of the connotative mood

one would overall have on mind. In fact, this can be sufficient in many applications,

such as for automatic music suggestion by the mood that best fits a listener's mood.

A different question is whether a learning model would benefit from a 'cleaner' repre-

sentation without change of mood over the length of a musical piece. For NTWICM,

one can assume the contained mainstream popular and commercially oriented music

to be less affected by such variation as might be found, e.g., in longer arrangements

of classical music. In fact, an analogon can be found in human emotion recognition:

Up to less than half of the duration of a spoken utterance may portray the perceived

emotion when annotated on isolated word level [ 30 ]. Yet, state-of-the-art emotion

recognition from speech usually ignores this fact by using turn-level labels rather

than word-level based labels [ 31 ].

Next Page

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home