Digital Signal Processing Reference
In-Depth Information
TUM AVIC Train: # instances(mean LOI/2)
TUM AVIC Develop: # instances(mean LOI/2)
400
400
300
300
200
200
100
100
0
0
-1
0
1
-1
0
1
Fig. 5.1 Mean Level of Interest (LoI, divided by 2) histograms for the train and develop partitions
of TUM AVIC [ 12 ]
laughter (261), and coughing, other human noise (716). There is a total of 18 581
spoken words, and 23 084 word-like units including 2 901 non-linguistic vocalisa-
tions (19.5 %). The overall annotation thus contains per sub-speaker-turn information
on the spoken content, non-linguistic vocalisations, individual LoI annotator tracks,
and the mean LoI across annotators.
The gold standard is established either by majority vote on discrete ordinal classes
or by shifting to a continuous scale obtained by averaging over the single annotators'
LoI. The histogram for this mean LoI is shown in Fig. 5.1 . As can be seen in the
figure, the subjects had a tendency to be rather polite: Almost no negative average
LoI was annotated. Note that here the original LoI scale reaching from LoI
2to
LoI
by division by 2 in accordance with the scaling as is
adopted in other corpora in this field, e.g., [ 13 ]. Apart from a higher resolution of LoI,
the continuous representation form allows for subtraction of a subject's long-term
interest profile to adapt to the mood or personality of the individual.
The overall 21 speakers (and 3 880 sub-speaker-turns) were partitioned speaker-
independently in the best achievable balance with priority on gender, next age, and
then ethnicity into three partitions: Train (1 512 sub-speaker-turns in 51:44 min of
speech of 4 female, 4 male speakers), Develop (1 161 sub-speaker-turns in 43:07 min
of speech of 3 female, 3 male speakers), and Test (1 207 sub-speaker-turns in
42:44 min of speech of 3 female, 4 male speakers).
+
2 is mapped to
[−
1
,
1
]
5.3.2 Example in Music: NTWICM
In the second example, we emphasise more on the problem of choosing an appropriate
model and measuring reliability of labellers. A particularly ambiguous task was
chosen for illustration—the mood in music. The data set was introduced in [ 14 ]for
a classification task, which was later extended to fully continuous modelling [ 15 ].
For building a music database annotated by mood, the compilation “Now That's
What I Call Music!” (U.K. series, volumes 1-69, double CDs, each) was selected
for the following reasons: No audio needed to be recorded—only the process of
its annotation was needed. The choice of a commercially available series allows
reproducibility by other researchers at a reasonable cost—the annotation can be dis-
tributed freely. Further, the decision to include a complete series ensures transparent
'non-prototypicality', i.e., no music pieces were pre-selected for example by choos-
ing the 'easy cases'—this reflects a realistic database management setting.
 
Search WWH ::




Custom Search