Digital Signal Processing Reference
In-Depth Information
Table 11.8
WA for tape speed variation compensation (w/) or its omission (w/o)
WA [ % ]
w / o
w /
MTV
70.5
70.5
CHANSON
69.8
72.5
CLASSIC
82.0
82.0
JAZZ
58.5
59.8
KEY-ALL
75.2
76.2
SVM, ten-fold SCV, Gaussian filter, whole piece, range C3-C8, 12 keys
Table 11.9
WA for different semitone filter functions
Triangle 2
WA [%]
Rectangle
Triangle
Gaussian
MTV
71.5
71.5
72.0
70.5
CHANSON
71.1
67.8
69.1
72.5
CLASSIC
84.3
79.8
77.5
82.0
JAZZ
58.5
57.3
59.8
59.8
KEY-ALL
76.2
73.7
75.4
76.2
SVM, ten-fold SCV, whole piece, range C3-C8, 12 keys. Triangle 2
indicates the squared triangle
function
For the band-pass filters, a rectangular, a triangular, a squared triangular, and a
Gaussian filter are considered. From these, the Gaussian filter is preferred, based
on the results that will be shown in Sect. 11.4.5 , Table 11.9 . However, it seems also
intuitive that it leads to good results, as it prefers contributions of frequencies closer to
the mid-frequencies as compared to, e.g., a rectangular filter. The standard deviation
is selected as
σ =
0
.
125, and the Gaussian filter g i (
f
)
with the mean frequency f i
thus resembles:
f
f i
2
(
f i 1 )
1
f i
g i (
f
) =
2
·
exp
(11.27)
2
·
0
.
125 2
0
.
125
·
π
Another aspect is the optimal length of the (macro) window of analysis [ 101 ]. As
different alternatives, the first 30 s, 60 s, 90 s, 120 s, and complete length of a piece are
considered with respect to the accuracy. This is depicted in Table 11.10 in Sect. 11.4.5 .
Finally, the optimal frequency range for key extraction is analysed with differ-
ent ranges covering four to seven octaves. The result is shown in Table 11.11 in
Sect. 11.4.5 .
11.4.3 Correlation-Based Analysis
Given the acoustic features, the key K that maximises the correlation with key tem-
plates is identified, where
κ
represents the input feature vector ( 11.29 ), and t cor (
C
)
 
 
Search WWH ::




Custom Search