Digital Signal Processing Reference
In-Depth Information
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 11.19 Valence confusions in the V3 classification task for selected feature subsets. Classifier
SVM, dataset AllInst of NTWICM [ 32 ]. a Spectral, b Rhythmic, c Chords, d Lyrics-BoW, e All, f
No-Lyrics
Table 11.24
UA and WA for the different raters (A-D) by on the A3 and V3 tasks
Rater
Arousal
Valence
[ % ]
UA
WA
UA
WA
A
43.4
43.6
58.5
57.6
B
63.8
60.0
48.5
48.1
C
53.0
52.0
55.3
53.5
D
47.8
46.9
54.2
56.3
Feature set No-Lyrics, set AllInst of NTWICM, SVM
Let us next consider the differences across raters with respect to the UA and WA
in Table 11.24 on the A3 and V3 task and set AllInst. There, the training and testing
was carried out exclusively on the ratings of one rater, each. As can be seen, either
the learnt classifier has varying difficulties to model raters, or raters' mood models
are more or less consistent. Interestingly, ratings of the professional DJ (rater A, cf.
Sect. 5.3.2 ) lead to the best result for valence. In the case of arousal, the deltas in UA
and WA are even more pronounced.
For an impression on the effect of exclusion of instances with lower agreement—
as is common practice in most other work—Table 11.25 shows the effect of limiting
test instances to the ones with a minimum agreement of two or three out of the four
raters. For training, however, all instances are used. According to one's intuition,
the UA and WA increases up to 8 % with increasing limitation to such prototypical
cases, in particular in the case of arousal. The table further shows the effect of the
SFFS feature selection to increase performance. In fact, results are improved by this
step except for prototypical arousal.
Table 11.26 contains additional results for the Random Forests (RFs) classifier
(based on decision trees) as an example of sub-sampling the feature space and boot-
Search WWH ::




Custom Search