Digital Signal Processing Reference
In-Depth Information
CHANSON and MTV sets. This may be owed to transpositions of the chorus towards
the end of a piece typical in these genres. In Classical and Jazz music, repeated
changes to other keys such as the relative major/minor counter key are apparently
better evened out looking at the overall piece. Given that on average over all genres
the whole piece is the best choice, this variant is used here.
As a final parameter let us have a look at the effect of different frequency ranges
for feature calculation in Table 11.11 . In three of four genres, the range from C3-C8
is the best choice and thus used for the evaluations. At the higher end of the scale, the
note C8 is two octaves above a human soprano singer's highest pitched note. Thus
it appears that respecting higher harmonics seems reasonable in key determination.
However, inclusion of the next octave up to C9 apparently degrades the results. An
explanation for this behaviour can be seen in the weakness of higher harmonics in
comparison to relatively stronger noise components from percussive sounds. At the
other end of the scale, the optimum found coincides with a human tenor singer's
range. It thus ignores lower bass components. This is different in the results for
JAZZ where a benefit arises from an extension to C2. In fact, virtuoso bass solos are
popular in this genre.
11.4.5.1 Evaluation of Feature Types and Performance
In addition to the WA of correctly classified keys, sub-dominant and dominant con-
fusions for 12 keys are given and further the relative minor and relative major key
confusions for 24 keys are added to the 'correctly' classified keys in the following.
This adheres to the validation protocol introduced by the MIREX challenge in 2005.
We first look at 12 keys: Data-driven results (cf. Table 11.12 ) base on SVM in
ten-fold SCV. This includes results per feature group and such for an 'optimised
space' by supervised feature selection (cf. Sect. 11.4.4 ).
As single feature group, CHROMA features lead to the best result. There, single
feature values 'clearly' represent the frequency characteristics of a musical piece.
Within the derived feature types, these are partly 'blurred'. Further, WAs of derived
features are generally lower. However, the additional features lead to better results
when uniting all features—2.5 % WA absolute more than CHROMA—and also when
selecting the best from the union of features—a further plus of 0.8 % WA and by that
the overall maximum of 77.3 % WA. This difference is significant at the common
level of 0.05 in a one-sided z -test.
Table 11.13 compares knowledge-based and data-driven key determination genre
by genre. The optimal setting is chosen, each, for the two approaches, namely the
'scale cadence' features for correlation and the 'optimised space' for SVMs.
The correlation approach is superior in three out of four cases for the correct key.
This changes, however, as more data for model-learning is available in the KEY-ALL
case.
Switching to 24 keys, Table 11.14 first shows results for the data-driven approach
with SVM in optimal parametrisation. Interestingly, no improvement is reached by
space optimisation in this case. Given the double amount of classes available, the
Search WWH ::




Custom Search