Digital Signal Processing Reference
In-Depth Information
Table 5.3 Overview on the raters (A-D) by age, gender, ethnicity, professional relation to music,
instruments played, and ballroom dance abilities, as well as CC between arousal ( A ) and valence
( V ) for each rater's annotations
Rater
Age (years)
Gender
Ethnicity
Prof. Relation
Instruments
Dancing
CC(V, A)
A
34
m
European
club DJ
guitar, drums
Standard/Latin
0.34
B
23
m
European
-
piano
Standard
0.08
C
26
m
European
-
piano
Latin
0.09
D
32
f
Asian
-
-
-
0.43
As mood perception is generally known to be highly subjective [ 19 ], it was decided
for four labellers. Details on these (three male, one female, aged between 23 and
34 years, average: 29 years) and their relation to music are provided in Table 5.3 .
Raters A-C stated that they listen to music several hours per day and have no distinct
preference of musical style, while rater D stated to listen to music every second day
on average and prefers Pop music.
As can be seen, they were picked to form a well-balanced set. They were asked to
make a forced decision assigning values in
for arousal and valence.
They annotated by the perceived mood, i.e., the 'represented' mood, not by the
induced mood, i.e., the 'felt' one, which could have resulted in too high labelling
ambiguity: One may know the represented mood, but it is not mandatory that the
intended or equal mood is actually felt by the raters. Indeed, depending on per-
ceived arousal and valence, different behavioural, physiological, and psychological
mechanisms are involved and contextual associations are often highly decisive [ 32 ].
The labellers listened via external sound proof headphones in an isolated and
silent laboratory environment. Labelling was carried out independently of the other
raters within a period of maximum 20 consecutive working days. Each session took
a maximum time of two hours. Each song was fully listened to with a maximum of
three times forward skipping by 30 s, followed by a short break. Playback of songs
was allowed, and the annotation could be reviewed. For the annotation a plugin 3 to
the open-source audio player Foobar 4 was provided. It displays the valence-arousal
plane in colour code as is shown in Fig. 5.3 and allows for selecting a class by clicking.
Based on each rater's labelling, Table 5.3 depicts the CC of valence and arousal
(rightmost column). 5 Clear differences are indicated looking at the variance among
these correlations. The distribution of labels per rater as depicted in Fig. 5.4 further
visualises these differences in individual perception of music mood.
To establish a gold standard that considers also songs that do not possess a majority
agreement in label, a new strategy has to be found: In the literature such instances
are usually discarded, which does not reflect a real world usage where any musical
{−
2
,
1
,
0
,
1
,
2
}
3 Available at http://www.openaudio.eu .
4 http://www.foobar2000.org
5 The complete annotation by the four individuals is available at http://www.openaudio.eu to ensure
reproducibility by others.
 
Search WWH ::




Custom Search