Identifying Saxophonists from Their Playing Styles - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

topic in music information retrieval. However,

this area of study is still in its infancy and per-

formance-based interpreter identification in a

general setting is an extremely difficult task.

The development of new signal processing tech-

niques would certainly increase the generality

of performance-based interpreter identification

systems. One important reserch direction is the

development of performance-based interpreter

identification systems capable of dealing with

polyphonic multi-intrument audio signals. We

are currently investigating multi-instrument (sax,

piano, double bass and drums) recordings by

famous musicians. In this case, the input audio

recordings have to be preprocessed in order to

extract the melody (i.e., separate the saxophone)

from the other instruments. Melody extraction is

in general a complex task. In the context of this

chapter, the melody extraction problem may be

somehow simplified by taking into account some

conditions often present in Jazz recordings. In

many Jazz recordings, especially old recordings,

the melody (in our case played by a saxophone)

has higher loudness than the accompaniment and

often the accompaniment consists of piano, bass

and drums. Taking this in account, we have applied

a monophonic fundamental frequency estimator

to the audio recording in order to determine the

predominant instantaneous pitch. This pitch usu-

ally corresponds to the pitch of the saxophone,

because the sax is the predominant instrument

(the one with higher loudness). We can consider

that drums can be partially discarded by using

a harmonic model, as it will be explained later,

and the bass line can be removed considering

only fundamental frequency candidates higher

than a certain frequency. Discarding the piano

components is a more difficult problem, but

in the tested recordings the piano was usually

playing chords with a lower loudness than the

saxophone, so it normally does not mask the sax

pitch. Once the fundamental frequency estimation

was applied, we synthesized the harmonics that

are multiples of the detected frequency in order

to obtain a monophonic audio containing only

the saxophone melody. This process produces a

lower quality saxophone sound compared with

the original sound but still the resulting audio

preserves important characteristics useful for

distinguishing among different interpreters.

We employ a Sinusoidal plus Noise model

which is able to decompose a sound into sinusoids

plus a spectral residual signal (Serra, 1990). The

analysis procedure detects partials by studying

the time-varying spectral characteristics of a

sound and represents them with time-varying

sinusoids. These partials are then subtracted from

the original sound and the remaining residual is

represented as a time-varying filtered white noise

component.

A short-time Fourier transform (STFT) is

computed and the prominent spectral peaks are

detected and incorporated into the existing par-

tial trajectories by means of a peak continuation

algorithm. A monophonic pitch detection step

improves the analysis by using the fundamental

frequency information in the peak continuation

algorithm. Thus, the algorithm estimates the

predominant fundamental frequency. The partials

which are multiple of the fundamental frequency

and have coherent trajectories are considered as

sinusoids. The remaining components are con-

sidered as the residual part. Finally, the sinusoid

components can be synthesized resulting on a

monophonic signal. The STFT is performed using

a Hamming window with 2048 samples, shifting

it 256 samples before computing the next frame.

By applying a harmonic analysis and taking

into account continuous trajectories, we are able

to remove most of the percussion components.

The fundamental frequency detector is applied

considering only the fundamental frequency

candidates within the range 200Hz-1000Hz in

order to discard the bass components. Once we

have extracted the saxophone melody from the

polyphonic recording, we can compute the note-

level and the intranote-level descriptors in the

same manner as described previously. Currently

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home