Digital Signal Processing Reference
In-Depth Information
Table 11.8
WA for tape speed variation compensation (w/) or its omission (w/o)
WA [ % ]
w / o
w /
MTV
70.5
70.5
CHANSON
69.8
72.5
CLASSIC
82.0
82.0
JAZZ
58.5
59.8
KEY-ALL
75.2
76.2
SVM, ten-fold SCV, Gaussian filter, whole piece, range C3-C8, 12 keys
Table 11.9
WA for different semitone filter functions
Triangle
2
WA [%]
Rectangle
Triangle
Gaussian
MTV
71.5
71.5
72.0
70.5
CHANSON
71.1
67.8
69.1
72.5
CLASSIC
84.3
79.8
77.5
82.0
JAZZ
58.5
57.3
59.8
59.8
KEY-ALL
76.2
73.7
75.4
76.2
SVM, ten-fold SCV, whole piece, range C3-C8, 12 keys. Triangle
2
indicates the squared triangle
function
For the band-pass filters, a rectangular, a triangular, a squared triangular, and a
Gaussian filter are considered. From these, the Gaussian filter is preferred, based
on the results that will be shown in Sect.
11.4.5
, Table
11.9
. However, it seems also
intuitive that it leads to good results, as it prefers contributions of frequencies closer to
the mid-frequencies as compared to, e.g., a rectangular filter. The standard deviation
is selected as
σ
=
0
.
125, and the Gaussian filter
g
i
(
f
)
with the mean frequency
f
i
thus resembles:
⎛
⎞
f
−
f
i
2
(
f
i
−
1
)
1
f
i
−
⎝
−
⎠
g
i
(
f
)
=
√
2
·
exp
(11.27)
2
·
0
.
125
2
0
.
125
·
π
Another aspect is the optimal length of the (macro) window of analysis [
101
]. As
different alternatives, the first 30 s, 60 s, 90 s, 120 s, and complete length of a piece are
considered with respect to the accuracy. This is depicted in Table
11.10
in Sect.
11.4.5
.
Finally, the optimal frequency range for key extraction is analysed with differ-
ent ranges covering four to seven octaves. The result is shown in Table
11.11
in
Sect.
11.4.5
.
11.4.3 Correlation-Based Analysis
Given the acoustic features, the key
K
that maximises the correlation with key tem-
plates is identified, where
κ
represents the input feature vector (
11.29
), and
t
cor
(
C
)