Digital Signal Processing Reference
In-Depth Information
The main approaches by template correlation-based and data-driven modelling were
opposed and evaluated on novel feature types. The data-driven model prevailed at a
maximum of 77.3 % WA for 12 keys for the whole dataset. For correct recognition of
six out of seven scale semitones, 94.2 % WA were reached. For individual datasets,
the correlation approach partly showed better results, but SVMs were superior given
sufficient data due to the ability to better cope with diversity: Perceptual studies of
tonal hierarchies show genre and task dependency according to [ 123 ]. In the case
of 24 keys the difference between these two approaches was amplified from 5.0 and
6.7 % absolute difference in WA. 62.1 % was the maximum WA for the correct key
and 84.9 % WA for six out of seven notes.
As for parametrisation, an optimum has been found for adapting reference pitch
classes to compensate for tape speed variation, using Gaussian filters for semitone
filtering, analysing the whole piece, and using the frequency band from C3 to C8 or
130.8 to 4 186 Hz, respectively, for feature computation. The proposed feature types
based on music theory and human perception were able to improve both approaches
for key assignment.
Future design of features for key determination could consider non-CHROMA
types such as bags of chords. In addition, further music theoretic or cognition inspired
approaches, e.g., inspired by [ 124 ] could be targeted. For the acoustic features, the
time-frequency representation could be improved, e.g., by wavelets [ 71 , 125 ]or
multi-resolution FFT. If one targets the mode instead of the 'absolute' key [ 126 ],
hierarchical schemes could be established. Non-tonal music audio could be modelled
as an additional class to cope with arbitrary music input [ 127 ]. Also, alternative minor
scales apart from the considered natural relative minor scale can be added. In [ 128 ],
PTR is given for harmonic and melodic minor scales which could be implemented
directly in the presented approach.
Extending to pieces with changing key can be achieved based on local analysis
[ 129 ]. Chunking for such local analysis could be based on beat and on-beat detection
[ 6 , 23 ] as presented in the previous two sections. Further, temporal context can then
be integrated by the use of LSTM networks [ 23 ]. Further, the novel features could be
used in related tonal analysis tasks [ 10 ], use key analysis to improve music structure
analysis [ 30 , 130 ], or exploit synergies by parallel key and progression analysis [ 131 ]
or similar mutually dependent information [ 99 ]. Finally, the results demonstrate the
complexity of key determination, and confidence measures and key hierarchies can
be useful considerations for application in real-life systems.
11.5 Chords
A more fine-granular description beyond the musical key is provided by the chord
progression in music. In the following, the method as presented in [ 10 ] and [ 29 ]is
explained and benchmark results are presented.
 
Search WWH ::




Custom Search