Digital Signal Processing Reference
In-Depth Information
[%]
100
90
Dance
Oldies
Pop
Rock
German
80
70
60
50
40
Top1
Top2
Top3
Fig. 11.17
Correctly located chorus thumbnails by genre for the maximum deviation T max
= 2s
[ 31 ]
automatically generated thumbnails from the actual chorus sections are lacking—
evaluations are mostly of perceptive nature with individual listener ratings.
Figure 11.17 shows results by genre. Visibly, the task is best solved for electronic
dance music. This can be explained by the high similarity present in this genre
given the 'perfect' electronic tones and computer aided sequencing. We could further
speculate that the structure is less complex and less variations exist.
11.6.3 Summary
Within this section automatic generation of music thumbnails was shown. The
approach mainly based on a self-similarity matrix established on chromagram-type
features in combination with basic methods of image processing to locate diagonals.
In addition, beat positions were used as information. Best results were observed
for electronic dance music: There, the chorus location was determined correctly in
70 % of the pieces when allowing for a maximum deviation of 2 s. Averaged over all
considered genres, this value dropped to 48.6 %.
Future efforts could incorporate analysis of key changes [ 115 ] (cf. Sect. 11.4 ),
chord patterns [ 29 , 105 ] (cf. Sect. 11.5 ), or by classifying vocal and non-vocal sequ-
ences [ 9 ] (cf. Sect. 11.8 ). Obviously, machine learning could also be introduced as
well as alternative matching techniques such as in [ 146 ].
11.7 Mood
So far, we dealt with measurable characteristics of music. In the following section,
we take a look at music mood classification (cf. [ 32 ])—similar to the analysis of
emotion in speech (cf. Sect. 10.4.2 ) . While we will be looking at mood classes in
a discrete way, a natural extension is to model continuous dimensions, as was later
shown in [ 33 ].
 
 
Search WWH ::




Custom Search