Applications in Intelligent Music Analysis - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Methods operating in the time domain were frequently seen in early reduction

functions. An example is the one in [ 55 ] that normalises the loudness of the signal

before splitting it into multiple bands via bandpass filters. Onsets are then detected

per band as peaks in the first order difference of the logarithm of the amplitude

envelope. The band-wise onsets are then combined to determine the final set of

detected onsets. Onset detection in the time domain has, however, its short-comings

as onsets are often masked in this domain by higher energy signals. Today, many

reduction functions thus operate on a spectral audio signal representation. Typical

solutions—all based on the STFT of the signal—are:

Spectral difference : The spectral difference (SD) function is the bin-wise differ-

ence of two consecutive short-time spectra. Positive differences are then summed up

across bins. The L 1 -norm [ 52 ]or L 2 -norm [ 56 ] can be used to assess the function. In

case of the L 1 -norm, the function is referred to as spectral flux (SF). These methods

are among the best so far.

High frequency content : Percussive sounds tend to have a high amount of energy

in upper frequency bands. This fact can be used by weighting each STFT bin pro-

portionally to its frequency. The sum of the weighted bins is the high frequency

content (HFC), and can be used as a detection function. The HFC method is suited

for percussive onsets, but less for other types of onsets [ 56 ].

Phase deviation : So far, functions were based on the spectral magnitudes. The

phase change in a STFT frequency bin can serve as rough estimate of its instantaneous

frequency. Should this frequency change, it is likely because of an onset [ 56 ]. The

mean phase change over all frequency bins helps to reduce 'deletions' of onsets

because of phase wrap around. This method is known as the phase deviation (PD)

detection function. An extension is the normalised weighted phase deviation (NWPD)

[ 52 ], that first weights each frequency bin's contribution to the phase deviation by

its magnitude and then normalises the result by the sum of the magnitudes.

Complex domain : In this method, also magnitude and phase information are used.

It is calculated for the current frame based on the last two predecessors under the

assumption of constant amplitude and phase change rate. The sum of the magnitude

of the complex differences between the actual values for each frequency bin and

the estimated values is then computed as a detection function [ 57 ]. The rectified

complex domain (RCD) [ 52 ] modifies this algorithm by only summing over positive

amplitude changes. This is based on the observation that for onset detection increases

of the signal amplitude are generally more relevant than decreases.

Pitch detection : Discontinuities and perturbations in the pitch contour can be

assumed as indication for onsets [ 58 ]. The information on the location of these

phenomena can also be used in combination with energy analysis [ 53 ].

Probabilistic models : The negative log-likelihood (NLL) method [ 59 ] defines

two different statistical models for the signal. A sudden change in these models

indicates a potential onset. This is known to work well for soft onsets [ 56 ].

Automatic classification : Employing a trained machine learning algorithm allows

for the design of more general detection functions, such as the one in [ 60 ]basingona

convolutional neural network—the winner of the MIREX 2005 audio onset detection

evaluation.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home