Information Technology Reference
In-Depth Information
topic in music information retrieval. However,
this area of study is still in its infancy and per-
formance-based interpreter identification in a
general setting is an extremely difficult task.
The development of new signal processing tech-
niques would certainly increase the generality
of performance-based interpreter identification
systems. One important reserch direction is the
development of performance-based interpreter
identification systems capable of dealing with
polyphonic multi-intrument audio signals. We
are currently investigating multi-instrument (sax,
piano, double bass and drums) recordings by
famous musicians. In this case, the input audio
recordings have to be preprocessed in order to
extract the melody (i.e., separate the saxophone)
from the other instruments. Melody extraction is
in general a complex task. In the context of this
chapter, the melody extraction problem may be
somehow simplified by taking into account some
conditions often present in Jazz recordings. In
many Jazz recordings, especially old recordings,
the melody (in our case played by a saxophone)
has higher loudness than the accompaniment and
often the accompaniment consists of piano, bass
and drums. Taking this in account, we have applied
a monophonic fundamental frequency estimator
to the audio recording in order to determine the
predominant instantaneous pitch. This pitch usu-
ally corresponds to the pitch of the saxophone,
because the sax is the predominant instrument
(the one with higher loudness). We can consider
that drums can be partially discarded by using
a harmonic model, as it will be explained later,
and the bass line can be removed considering
only fundamental frequency candidates higher
than a certain frequency. Discarding the piano
components is a more difficult problem, but
in the tested recordings the piano was usually
playing chords with a lower loudness than the
saxophone, so it normally does not mask the sax
pitch. Once the fundamental frequency estimation
was applied, we synthesized the harmonics that
are multiples of the detected frequency in order
to obtain a monophonic audio containing only
the saxophone melody. This process produces a
lower quality saxophone sound compared with
the original sound but still the resulting audio
preserves important characteristics useful for
distinguishing among different interpreters.
We employ a Sinusoidal plus Noise model
which is able to decompose a sound into sinusoids
plus a spectral residual signal (Serra, 1990). The
analysis procedure detects partials by studying
the time-varying spectral characteristics of a
sound and represents them with time-varying
sinusoids. These partials are then subtracted from
the original sound and the remaining residual is
represented as a time-varying filtered white noise
component.
A short-time Fourier transform (STFT) is
computed and the prominent spectral peaks are
detected and incorporated into the existing par-
tial trajectories by means of a peak continuation
algorithm. A monophonic pitch detection step
improves the analysis by using the fundamental
frequency information in the peak continuation
algorithm. Thus, the algorithm estimates the
predominant fundamental frequency. The partials
which are multiple of the fundamental frequency
and have coherent trajectories are considered as
sinusoids. The remaining components are con-
sidered as the residual part. Finally, the sinusoid
components can be synthesized resulting on a
monophonic signal. The STFT is performed using
a Hamming window with 2048 samples, shifting
it 256 samples before computing the next frame.
By applying a harmonic analysis and taking
into account continuous trajectories, we are able
to remove most of the percussion components.
The fundamental frequency detector is applied
considering only the fundamental frequency
candidates within the range 200Hz-1000Hz in
order to discard the bass components. Once we
have extracted the saxophone melody from the
polyphonic recording, we can compute the note-
level and the intranote-level descriptors in the
same manner as described previously. Currently
Search WWH ::




Custom Search