Applications in Intelligent Music Analysis - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Table 11.17

Top-ranked chord uni- and bigrams in the LM by frequency of occurrence

Rank

1-gram

#

2-gram

#

1

G

244 820

D-G

57 500

2

D

227 549

G-G

55 106

3

A

198 958

C-G

54 702

4

C

188 194

G-C

54 040

5

E

130 896

A-D

46 162

6

F

87 741

D-A

43 534

7

B

72 360

G-G

41 090

8

Am

58 929

A-A

40 161

9

Em

57 537

D-D

39 710

10

A#

32 583

E-A

36 659

Table 11.18

WA for the ChoRD corpus, LOSO evaluation

WA[%]

Correlation

SVM

HMM

+

LM

24 major / minor

39.41

40.24

58.57

60.13

36 major / minor / other

28.37

36.71

45.39

48.84

'Other' chords cover augmented, diminished, power, and sustained chords

As alternative data-driven processing methods, we compare SVMs to HMMs with

and without the language model. A linear kernel, pairwise multi-class discrimination,

and SMO learning proved as best choice for SVMs. For HMMs, one continuous

model with one emitting state per beat was used. The models were trained with 20

Baum-Welch iterations [ 133 ]. A single Gaussian mixture component was the best

choice. To enable Viterbi search for decoding, a 'word-loop' context free grammar

modelled the chord sequence in the case where no data-driven language model was

used. On the other hand, when the language model is enabled (HMM + LM), Laplace

smoothed class-based Katz back-off-bigrams with a cutoff of one were found as best

configuration.

11.5.3 Performance

A song-independent cyclic 'leave-one-song-out' (LOSO) training and testing was

chosen as evaluation strategy under realistic conditions. Table 11.18 depicts observed

WA for the different data-free and data-learnt chord determination strategies.

One notes that with increasing data inclusion on the AM and LM level and context

modelling, the WA is increased. By that, HMM exceed SVM as they allow for

contextual modelling. The mapping to and by that reduction to major and minor

chords leads to higher WA despite still handling 'any input', if this appropriate in

the context of the application.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home