Audio Data - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Table 5.3 Overview on the raters (A-D) by age, gender, ethnicity, professional relation to music,

instruments played, and ballroom dance abilities, as well as CC between arousal ( A ) and valence

( V ) for each rater's annotations

Rater

Age (years)

Gender

Ethnicity

Prof. Relation

Instruments

Dancing

CC(V, A)

European

club DJ

guitar, drums

Standard/Latin

0.34

European

piano

Standard

0.08

European

piano

Latin

0.09

Asian

0.43

As mood perception is generally known to be highly subjective [ 19 ], it was decided

for four labellers. Details on these (three male, one female, aged between 23 and

34 years, average: 29 years) and their relation to music are provided in Table 5.3 .

Raters A-C stated that they listen to music several hours per day and have no distinct

preference of musical style, while rater D stated to listen to music every second day

on average and prefers Pop music.

As can be seen, they were picked to form a well-balanced set. They were asked to

make a forced decision assigning values in

for arousal and valence.

They annotated by the perceived mood, i.e., the 'represented' mood, not by the

induced mood, i.e., the 'felt' one, which could have resulted in too high labelling

ambiguity: One may know the represented mood, but it is not mandatory that the

intended or equal mood is actually felt by the raters. Indeed, depending on per-

ceived arousal and valence, different behavioural, physiological, and psychological

mechanisms are involved and contextual associations are often highly decisive [ 32 ].

The labellers listened via external sound proof headphones in an isolated and

silent laboratory environment. Labelling was carried out independently of the other

raters within a period of maximum 20 consecutive working days. Each session took

a maximum time of two hours. Each song was fully listened to with a maximum of

three times forward skipping by 30 s, followed by a short break. Playback of songs

was allowed, and the annotation could be reviewed. For the annotation a plugin 3 to

the open-source audio player Foobar 4 was provided. It displays the valence-arousal

plane in colour code as is shown in Fig. 5.3 and allows for selecting a class by clicking.

Based on each rater's labelling, Table 5.3 depicts the CC of valence and arousal

(rightmost column). 5 Clear differences are indicated looking at the variance among

these correlations. The distribution of labels per rater as depicted in Fig. 5.4 further

visualises these differences in individual perception of music mood.

To establish a gold standard that considers also songs that do not possess a majority

agreement in label, a new strategy has to be found: In the literature such instances

are usually discarded, which does not reflect a real world usage where any musical

{−

, −

}

3 Available at http://www.openaudio.eu .

4 http://www.foobar2000.org

5 The complete annotation by the four individuals is available at http://www.openaudio.eu to ensure

reproducibility by others.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home