Digital Signal Processing Reference
In-Depth Information
Generally speaking, 'rhythm' describes patterns of changes. In music, a 'beat'
corresponds to the perceived pulses which mark off equal durational units and is our
basis of comparison for measurements of rhythmic durations. The 'tempo' refers to
the beats' 'striking rate', whereas 'metre' represents accent structure of the beats.
Considering 'metre', the metrical structure of a musical piece is composed of multiple
hierarchical levels [ 66 ]. There, the tempo on higher levels is an integer multiple of
the one on the lowest level, which is also referred to as 'tatum' level. When we tap
along with a song, we do this on the 'pulse' or 'beat' level, which can be referred to
as the quarter-note tempo. The 'bar' or 'measure' level corresponds to the unit of a
bar in notated music. The relation between measure and beat level then is the metre
or 'time signature' of a musical piece.
Current tempo detection algorithms mostly base on periodicity detection:
Autocorrelation, resonant filter banks or onset time statistics (cf. Sect. 11.2 ) are some
examples as summarised in [ 51 ]. Very few approaches, however, aim at synergis-
tic common or combined assessment of tempo together with related information
such as metre or beat-tracking to provide a robust basis for higher level tasks, such
as ballroom dance style or genre recognition. Further, few studies introduce data-
driven genre and metre recognition [ 67 , 68 ]. Others [ 69 - 71 ] use rhythmic feature
information for specialised tasks such as audio identification.
In this section, an approach for robust data-driven rhythm analysis is discussed.
To this end, LLDs modelling rhythmic information are presented that are tailored to
classify duple and triple metre and ballroom dance styles. Once these are determined,
the information is used to reliably assess the quarter-note tempo and avoid 'octave'
errors, i.e., doubling, tripling, halving, etc., of the tempo by mistake.
The determination of tempo, metre, and (on-)beat positions [ 25 ] can be roughly
divided into two major principles:
The first strategy starts with the location of onsets in the audio (or sym-
bolic notation such as MIDI) as was shown in the last section. Then, the desired
determination tasks are based on the analysis of the inter-onset intervals (IOIs)
[ 72 - 78 ]. To this end, histogram approaches are found most frequently [ 13 , 75 ].
There, duration and weight of all possible IOIs are calculated. IOIs are binned by
similarity clustering and the clusters are arranged in a histogram. From the weights
and the centres of the clusters the tempo of several metrical levels can be estimated.
Alternatively, rule-based approaches are employed [ 13 ]. Or, exclusively the Tatum
pulse, i.e., the fastest tempo present in a piece is computed by choosing the clus-
ter with the centre of the smallest IOI [ 75 ]. Then, within a window around each
Tatum pulse features are extracted and the Tatum pulses are classified, e.g., by
Bayesian methods, with respect to their perceived accentuation. By that, the beat
level is detected based on the assumption that beats are more accented than off-beat
pulses.
In the second strategy to determine tempo, metre, and (on-)beat positions the
order is inverted, i.e., after analysis of tempo and metrical structure onset positions
are retrieved. In this case, resonator methods or the related correlation approaches
are commonly used. Onset localisation then benefits from the knowledge gained
throughout tempo detection [ 5 , 13 , 14 , 16 , 19 , 79 ]. This second strategy tends to lead
Search WWH ::




Custom Search