Digital Signal Processing Reference
In-Depth Information
14.2.2 Applications to the Detection of Overlapping
Sound Events
NMF algorithms have been applied to various problems in computer vision, signal
processing, biomedical data analysis and text classification among others [ 25 ]. In the
context of sound processing, the matrix V is in general a time-frequency representa-
tion of the sound to analyze. The rows and columns represent respectively different
frequency bins and successive time-frames. The factorization v j i h ij w i can
then be interpreted as follows: each basis vector w i contains a spectral template, and
the decomposition coefficients h ij represent the activations of the i -th template at
the j -th time-frame.
Concerning the detection of overlapping sound events, NMF has been widely used
in off-line systems for polyphonic music transcription, where the sound events cor-
respond roughly to notes (e.g., see [ 26 , 27 ]). Several problem-dependent extensions
have been developed to provide controls on NMF in this context, such as a source-
filter model [ 28 ], an harmonic constraint [ 29 ], a selective sparsity regularization [ 30 ],
or a subspace model of basis instruments [ 31 ]. Most of these systems consider either
the standard Euclidean cost or the Kullback-Leibler divergence. Recent works yet
have investigated the use of other cost functions such as the Itakura-Saito divergence
[ 32 - 35 ] or the more general parametric beta-divergence [ 17 ].
Some authors have also used non-negative decomposition for sound event detec-
tion. A real-time system to identify the presence and determine the pitch of one or
more voices is proposed in [ 4 ] and is adapted to sight-reading evaluation of solo
instrument in [ 5 ]. Concerning automatic transcription, off-line systems are used in
[ 6 ] for drum transcription and in [ 7 ] for polyphonic music transcription. A real-
time system for polyphonic music transcription is also proposed in [ 8 ] and is further
developed in [ 9 ] for real-time coupled multiple-pitch and multiple-instrument recog-
nition. All these systems consider either the Euclidean or the Kullback-Leibler cost
function, and only the latter provides a control on the decomposition by enforcing
the solutions to have a fixed desired sparsity.
Other approaches in the framework of probabilistic models with latent variables
also share common perspectives with NMF techniques [ 36 ]. In this framework, the
non-negative data are considered as a discrete distribution and are factorized into a
mixture model where each latent component represents a source. It can then be shown
that maximum likelihood estimation of the mixture parameters amounts to NMF with
the Kullback-Leibler divergence, and that the classical expectation-maximization
algorithm is equivalent to the multiplicative updates scheme. Considering the prob-
lem in a probabilistic framework is however convenient for enhancing the standard
model and adding regularization terms through priors and maximum a posteriori
estimation instead of maximum likelihood estimation. In particular, the framework
has been employed in polyphonic music transcription to include shift-invariance and
sparsity [ 37 ]. Recent works have extended the later model to include a temporal
smoothing and a unimodal prior for the impulse distributions [ 38 ], a hierarchical
Search WWH ::




Custom Search