Digital Signal Processing Reference
In-Depth Information
row-wise concatenation of a sequence of short-time spectra (in the form of row
vectors):
V
V
···
V
: ,
1
: ,
2
: ,
N
T
+
1
.
.
.
,
V :=
(8.3)
···
V
T V
···
V
: ,
: ,
T
+
1
: ,
N
where T is the desired context length. That is, the columns of V correspond to
overlapping sequences of spectra in V . If signal reconstruction in the time domain is
desired, the above named spectrogram transformations, including Mel filtering and
transformation according to ( 8.3 ), can be reversed.
The basic NMF method as explained above is entirely unsupervised. In many
practical applications, such as speech or music separation, prior knowledge about
the problemstructure can be exploited. Asimple yet very effectivemethod to integrate
a-priori knowledge into NMF-based source separation is to perform supervised or
semi-supervised NMF. This means that parts of the first NMF factor are predefined
as a set of spectra characteristic for the sources to be separated rather than choosing
random initialisations of both factors. This can be useful in audio enhancement, e.g.,
in a 'cocktail party' situation with several simultaneous speakers [ 6 , 17 ], or noise
versus a speaker of interest [ 18 ]. The initialisation spectra may themselves stem
from NMF decomposition of training material or can be based on simpler methods
such as median filtering or simply random sampling of training spectrograms. This
procedure is outlined in Fig. 8.1 as a flowchart. An alternative supervised NMF
method, depicted in Fig. 8.2 , is to assign components computed by unsupervised
NMF to classes such as 'drums' and 'non-drums' by means of a supervisedly trained
classifier as in [ 19 ]. This allows dealing with observations that cannot be described as
a linear combination of pre-defined spectra, but assumes that unsupervised NMF by
itself can extract meaningful units, such as notes of different instruments. Given an
assignment of NMF components to sources as described above, it is straightforward
to synthesise the audio signals of interest by overlaying component spectrograms.
Fig. 8.1 Supervised NMF: A set of spectral components (which can themselves be computed by
NMF from training audio) serve as constant basis for NMF; the activations can be exported as
features or be used to synthesise audio signals for the sources [ 12 ]
 
Search WWH ::




Custom Search