Tools for Music Information Retrieval and Playing - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

potentialities of an integrated approach to music

description. In order to solve the third point, all

present approaches to score-to audio synchroni-

zation proceed in two stages: in the first stage,

suitable parameters are extracted from the score

and audio data streams making them comparable;

in the second stage, an optimal alignment is com-

puted by means of dynamic programming (DP)

based on a suitable local distance measure.

Turetsky et al. [§7], first to convert the score

data (given in MIDI format) into an audio data

stream using a synthesizer. Then, the two audio

data streams are analyzed by means of a short-time

Fourier transform (STFT) which in turn yields a

sequence of suitable feature vectors.

Based on an adequate local distance measure

permitting a pairwise comparison of these feature

vectors, the best alignment is derived by means

of DTW. The approach of Soulez, Rodet, and

Schwarz (2003) is similar to Turetsky et al. [§7]

with one fundamental difference: In Turetsky et

al. [§7], the score data is first converted into the

much more complex audio format—in the actual

synchronization step the explicit knowledge of

note parameters is not used. Contrary to Soulez et

al. (2003) who explicitly uses note parameters such

as onset times and pitches to generate a sequence

of attack, sustain and silence models which are

used in the synchronization process. This results

in a more robust algorithm with respect to local

time deviations and small spectral variations.

Since the STFT is used for the analysis of the

audio data stream, both approaches have the fol-

lowing drawbacks:

Firstly, the STFT computes spectral coeffi-

cients which are linearly spread over the spectrum

resulting in a bad low-frequency resolution. There-

fore, one has to rely on the harmonics in the case

of low notes. This is problematic in polyphonic

music where harmonics and fundamental frequen-

cies of different notes often coincide. Secondly, in

order to obtain a sufficient time resolution one has

to work with a relatively large number of feature

vectors on the audio side. (For example, even with

a rough time resolution of 46 ms as suggested in

Turetsky et al. [§7] more than 20 feature vec-

tors per second are required.) This leads to huge

memory requirements as well as long running

times in the DTW computation.

In the approach of Arifi (2004), note parameters

such as onset times and pitches are extracted from

the audio data stream (piano music). The alignment

process is then performed in the score-like domain

by means of a suitably designed cost measure on

the note level. Due to the expressiveness of such

note parameters only a small number of features

is sufficient to solve the synchronization task, al-

lowing for a more efficient alignment. One major

drawback of this approach is that the extraction

of score-like note parameters from the audio

data—a kind of music transcription—constitutes

a difficult and time-consuming problem, pos-

sibly leading to many wrongly extracted audio

features. This makes the subsequent alignment

step a delicate task.

Muller, Kurth, and Roder (2004) present an

algorithm, which solves the synchronization

problem accurately and efficiently for complex,

polyphonic piano music. In a first step, they extract

from the audio data stream a set of highly expres-

sive features encoding note onset candidates

separately for all pitches. This makes computa-

tions efficient since only a small number of such

features are sufficient to solve the synchroniza-

tion task. Based on a suitable matching model,

the best match between the score and the feature

parameters is computed by dynamic programming

(DP). To further cut down the computational cost

in the synchronization process, they introduce

the concept of anchor matches, matches which

can be easily established. Then the DP-based

technique is locally applied between adjacent

anchor matches.

references

Bollobás Béla (1998). Modern graph theory . New

York: Springer-Verlag.

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home