Applications in Intelligent Music Analysis - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

calculation and analysis of self-similarity matrices (cf. Sect. 11.3 ) or segmentation

with a subsequent clustering or classification.

The authors in [ 135 ] propose a modulated complex transformation logarithmised

and reduced by oriented PCA for clustering of similar sequences. The clusters are

then classified into different parts of a piece by scaled Renyi entropy and spectral

flatness. In [ 136 ] MFCCs are used for clustering by modified KL distance. Another

approach in the same article [ 136 ] uses ergodic HMMs for structure analysis. Ergodic

HMMsarealsousedin[ 137 ]. These have three states. As features serve the spectral

envelope with MFCCs, LPCCs and discrete cepstrum coefficients. A chunking can

take place by a clustering algorithm to initialise the ergodic HMMs. GMMs initialised

by a clustering step are applied in [ 138 ]. In [ 139 ], the music signal is chunked by an

event detection function prior to dynamic time warping (DTW). Music visualisation

by a self-similarity matrix for structure analysis was first based on MFCCs using the

scalar product [ 140 ], and later using a normalised scalar product [ 141 ]. The authors

in [ 142 ] use dynamic features which maximise the trans-information for computation

of such a self-similarity matrix.

In [ 143 ] an unsupervised Bayesian clustering model is used. Its parameters are

estimated by a modified EM algorithm. The authors in [ 120 ] perform a beat synchro-

nous segmentation using a beat-tracker. Then, a self-similarity matrix is established

based on CHROMA features. By uniform moving average filtering, a time-lag matrix

is computed. Its maximum element is determined within limitations of the minimum

lag and the maximum occurrence of a section. An extension is presented in [ 144 ].

It permits modulated repetitions and an adapted measure to determine the chorus

sections. The authors of [ 145 ] suggest features based on harmonic information for

the creation of self-similarity matrices.

From the above a number of findings can be distilled: In pre-processing, beat-

synchrony seems advisable given robust beat detection. As for features, one should

model the musical properties of the signal such as by PCPs or more specifically

CHROMA, as these tend to be better suited than MFCCs or similar types. Further,

temporal information should be modelled as by CENS features or similar [ 145 ].

As for the model, self-similarity matrices seem best suited. Given reliable beat-

synchrony, dynamic modelling is not needed or might even downgrade results. In

the remainder of this section, we consider a solution following these guide-lines and

incorporating simple image processing methods for the processing of a self-similarity

matrix. We will also need to define evaluation measures which are not settled for

this task. Exemplary results will be given on a full day of MP3 compressed recorded

music from multiple styles.

11.6.1 Methodology

As in the last section on chord progression analysis, CENS features (cf. Sect. 6.2.2.3 )

are used for the acoustic representation. These will be denoted from now on as

Search WWH ::

Custom Search

Home