Digital Signal Processing Reference
In-Depth Information
calculation and analysis of self-similarity matrices (cf. Sect. 11.3 ) or segmentation
with a subsequent clustering or classification.
The authors in [ 135 ] propose a modulated complex transformation logarithmised
and reduced by oriented PCA for clustering of similar sequences. The clusters are
then classified into different parts of a piece by scaled Renyi entropy and spectral
flatness. In [ 136 ] MFCCs are used for clustering by modified KL distance. Another
approach in the same article [ 136 ] uses ergodic HMMs for structure analysis. Ergodic
HMMsarealsousedin[ 137 ]. These have three states. As features serve the spectral
envelope with MFCCs, LPCCs and discrete cepstrum coefficients. A chunking can
take place by a clustering algorithm to initialise the ergodic HMMs. GMMs initialised
by a clustering step are applied in [ 138 ]. In [ 139 ], the music signal is chunked by an
event detection function prior to dynamic time warping (DTW). Music visualisation
by a self-similarity matrix for structure analysis was first based on MFCCs using the
scalar product [ 140 ], and later using a normalised scalar product [ 141 ]. The authors
in [ 142 ] use dynamic features which maximise the trans-information for computation
of such a self-similarity matrix.
In [ 143 ] an unsupervised Bayesian clustering model is used. Its parameters are
estimated by a modified EM algorithm. The authors in [ 120 ] perform a beat synchro-
nous segmentation using a beat-tracker. Then, a self-similarity matrix is established
based on CHROMA features. By uniform moving average filtering, a time-lag matrix
is computed. Its maximum element is determined within limitations of the minimum
lag and the maximum occurrence of a section. An extension is presented in [ 144 ].
It permits modulated repetitions and an adapted measure to determine the chorus
sections. The authors of [ 145 ] suggest features based on harmonic information for
the creation of self-similarity matrices.
From the above a number of findings can be distilled: In pre-processing, beat-
synchrony seems advisable given robust beat detection. As for features, one should
model the musical properties of the signal such as by PCPs or more specifically
CHROMA, as these tend to be better suited than MFCCs or similar types. Further,
temporal information should be modelled as by CENS features or similar [ 145 ].
As for the model, self-similarity matrices seem best suited. Given reliable beat-
synchrony, dynamic modelling is not needed or might even downgrade results. In
the remainder of this section, we consider a solution following these guide-lines and
incorporating simple image processing methods for the processing of a self-similarity
matrix. We will also need to define evaluation measures which are not settled for
this task. Exemplary results will be given on a full day of MP3 compressed recorded
music from multiple styles.
11.6.1 Methodology
As in the last section on chord progression analysis, CENS features (cf. Sect. 6.2.2.3 )
are used for the acoustic representation. These will be denoted from now on as
 
Search WWH ::




Custom Search