Information Technology Reference
In-Depth Information
D'Aguanno et al. (2006) presents a template
matching technique (based on WSP only) to reach
a general-purpose tempo tracker on music with
drums. Because the WSP is structured coher-
ently with the drum line, it is possible to compare
this pattern with a simple template made up by
a vector filled with 0 and with a 1 value where
is a metronome beat. Any elements of this array
represent an MP3 granule. This array has to be
matched with the WSP found by the MP3 encoder.
An estimation function is required. This function
has to yield the distance between the metronome
template examined and the real MP3 window-
switching pattern. In figure 26 the metronome
template is represented by the darkest line with
-1 peak. Any peak is a metronome beat. In this
figure is clear the WSP structure is coherent with
song time even if the song has a very complex
drums line.
A first implementation reached a correct BPM
recognition in the 50% songs and in another
30% it is possible to estimate the correct BPM
by the program results. The algorithm fails in
the 20% of the songs. These values come from
an experimentation finalized to demonstrate the
WSP capabilities. Obviously the WSP alone is
not sufficient to act like a beat tracker, but it is
adequate to solve a tempo tracking contest.
lem just for monophonic music. To transcribe
monophonic music many solution algorithms have
been proposed, including time-domain techniques
based on zero-crossing and autocorrelation, as
well as frequency frequency-domain based on
the discrete Fourier transform and the cepstrum
(Brown, 1992; Brown & Puckette 1993; Brown &
Zhang, 1991). These algorithms proved to be reli-
able and commercially applicable. In polyphonic
music transcription the situation is not so positive.
These results are not so encouraging because of
the increased complexity of the signals in ques-
tion. It should be noted that score extraction is
a composed task. In fact, we can subdivide this
problem in a set of different tasks: pitch tracking
to get information about the notes, beat tracking to
understand the correct rhythmical figures, source
separation to separate a single instrument part
from the other, timbre extraction to understand
which instruments have to be insert in the score,
and so on. Many algorithms have been proposed
to solve the problem constrained to monotimbrical
music (i.e., a piano score with many voices simul-
taneously). These algorithms are very similar to
the algorithm presented in section Pitch Tracking
using the same low-level feature extractor but with
a second stage dedicated to interpret the low-level
results. The low-level features are often identified
with the term “midlevel representation”. A good
mid-level representation for audio should be able
to separate individual sources, be invertible in a
perceptual sense, reduce the number of compo-
nents, and reveal the most important attributes of
the sound. Current methods for automatic music
transcription are often based on modeling the
music spectrum as a sum of harmonic sources and
estimating the fundamental frequencies of these
sources. This information constitutes an ad hoc
midlevel representation. In order to successfully
create a system for automatic music transcription,
the information contained in the analyzed audio
signal must be combined with knowledge of the
structure of music (Klapuri, 2004).
Score Extraction
Score extraction can be defined as the act of
listening to a piece of music and to write down
the score for the musical events that constitute
the piece. This implies the extraction of specific
features out of a musical acoustic signal, resulting
in a symbolic representation that comprises notes,
pitches, timings, dynamics, timbre, and so on.
The score extraction task is not simple and
intuitive like beat tracking. People without mu-
sical education find this task very difficult and
they are not be able to perform it. The automatic
transcription of music is a well-understood prob-
Search WWH ::




Custom Search