Information Technology Reference
In-Depth Information
Figure 25. Histogram for 45 songs with drum-
sounds (Wang, 2001)
between near note onsets, to generate the initial
tempo hypotheses, which are fed into the second
stage, and beat tracking, which searches for se-
quences of events which support the given tempo
hypothesis. Agents perform the search. Any agent
represents a hypothesized tempo and beat phase,
and tries to match their predictions to the incoming
data. The closeness of the match is used to evalu-
ate the quality of the agents' beat tracking, and
the discrepancies are used to update the agents'
hypotheses. Multiple reasonable paths of action
result in new agents being created, and agents are
destroyed when they duplicate each other's work
or are continuously unable to match their predic-
tions to the data. The agent with the highest final
score is selected, and its sequence of beat times
becomes the solution (Dixon, 2001)
where pow beat (n) represents the local maximum
power on the n-th beat and pow other (n) represents
the local maximum power on positions between
the n-th beat and (n + 1)-th beat. The power-dif-
ference measure takes a value between 0 (easiest)
and 1 (most difficult). For a regular pulse sequence
with a constant interval, for example, this measure
takes a value of 0 (Goto, 2001).
Beat Tracking in Compressed Domain
In order to better understand the rest of the sec-
tion, a brief overview of the basic concepts about
MP3 audio standard is provided. We focus on the
Window-Switching pattern and its onset detector
behaviour.
Further information about MPEG standards
can be found in ISO/IEC 11172-3 , ISO/IEC 13818-
3, (Noll, 1997; Pan 1995).
MP3 uses four different MDCT window types:
long, long-to-short, short, short-to-long indexed
with 0,1,2,3 respectively. The long window, allows
greater frequency resolution for audio signals with
stationary characteristics, while the short one
provides better time resolution for transients (Pan,
1995). In short blocks there are 3 sets of window
values for a given frequency, in a window there
are 32 frequency sub bands, further subdivided
into 6 finer sub bands by MDCT. Three short
windows are then grouped in one granule. The
values are ordered by frequency, then by window.
The switch between long and short blocks is
not instantaneous. The two-window type long-
to-short and short-to-long serves to transition
between long and short window types. Because
The Dixon Algorithm
Dixon (2001) describes an audio beat tracking
system using multiple identical agents, each of
which represents a hypothesis of the current tempo
and synchronization (phase) of the beat. The
system works well for pop music, where tempo
variations are minimal, but does not perform well
with larger tempo changes. Dixon and Cambou-
ropoulos (2000) extend this work to provide for
significant tempo variations as found in expres-
sive performances of classical music. They use
the duration, amplitude and pitch information
available in MIDI data to estimate the relative
rhythmic salience (importance) of notes, and
prefer that beats coincide with the onsets of strong
notes. In this chapter, the salience calculation is
modified to ignore note durations because they
are not correctly recorded in the data. Process-
ing is performed in two stages: tempo induction
is performed by clustering of the time intervals
Search WWH ::




Custom Search