Digital Signal Processing Reference
In-Depth Information
Methods operating in the time domain were frequently seen in early reduction
functions. An example is the one in [ 55 ] that normalises the loudness of the signal
before splitting it into multiple bands via bandpass filters. Onsets are then detected
per band as peaks in the first order difference of the logarithm of the amplitude
envelope. The band-wise onsets are then combined to determine the final set of
detected onsets. Onset detection in the time domain has, however, its short-comings
as onsets are often masked in this domain by higher energy signals. Today, many
reduction functions thus operate on a spectral audio signal representation. Typical
solutions—all based on the STFT of the signal—are:
Spectral difference : The spectral difference (SD) function is the bin-wise differ-
ence of two consecutive short-time spectra. Positive differences are then summed up
across bins. The L 1 -norm [ 52 ]or L 2 -norm [ 56 ] can be used to assess the function. In
case of the L 1 -norm, the function is referred to as spectral flux (SF). These methods
are among the best so far.
High frequency content : Percussive sounds tend to have a high amount of energy
in upper frequency bands. This fact can be used by weighting each STFT bin pro-
portionally to its frequency. The sum of the weighted bins is the high frequency
content (HFC), and can be used as a detection function. The HFC method is suited
for percussive onsets, but less for other types of onsets [ 56 ].
Phase deviation : So far, functions were based on the spectral magnitudes. The
phase change in a STFT frequency bin can serve as rough estimate of its instantaneous
frequency. Should this frequency change, it is likely because of an onset [ 56 ]. The
mean phase change over all frequency bins helps to reduce 'deletions' of onsets
because of phase wrap around. This method is known as the phase deviation (PD)
detection function. An extension is the normalised weighted phase deviation (NWPD)
[ 52 ], that first weights each frequency bin's contribution to the phase deviation by
its magnitude and then normalises the result by the sum of the magnitudes.
Complex domain : In this method, also magnitude and phase information are used.
It is calculated for the current frame based on the last two predecessors under the
assumption of constant amplitude and phase change rate. The sum of the magnitude
of the complex differences between the actual values for each frequency bin and
the estimated values is then computed as a detection function [ 57 ]. The rectified
complex domain (RCD) [ 52 ] modifies this algorithm by only summing over positive
amplitude changes. This is based on the observation that for onset detection increases
of the signal amplitude are generally more relevant than decreases.
Pitch detection : Discontinuities and perturbations in the pitch contour can be
assumed as indication for onsets [ 58 ]. The information on the location of these
phenomena can also be used in combination with energy analysis [ 53 ].
Probabilistic models : The negative log-likelihood (NLL) method [ 59 ] defines
two different statistical models for the signal. A sudden change in these models
indicates a potential onset. This is known to work well for soft onsets [ 56 ].
Automatic classification : Employing a trained machine learning algorithm allows
for the design of more general detection functions, such as the one in [ 60 ]basingona
convolutional neural network—the winner of the MIREX 2005 audio onset detection
evaluation.
Search WWH ::




Custom Search