Audio Source Separation - Intelligent Audio Analysis - page 137

Digital Signal Processing Reference

In-Depth Information

row-wise concatenation of a sequence of short-time spectra (in the form of row

vectors):

⎡

⎤

V

V

···

V

: ,

1

: ,

2

: ,

N

−

T

+

1

⎣

.

.

.

⎦ ,

V :=

(8.3)

···

V

T V

···

V

: ,

: ,

T

+

1

: ,

N

where T is the desired context length. That is, the columns of V correspond to

overlapping sequences of spectra in V . If signal reconstruction in the time domain is

desired, the above named spectrogram transformations, including Mel filtering and

transformation according to ( 8.3 ), can be reversed.

The basic NMF method as explained above is entirely unsupervised. In many

practical applications, such as speech or music separation, prior knowledge about

the problemstructure can be exploited. Asimple yet very effectivemethod to integrate

a-priori knowledge into NMF-based source separation is to perform supervised or

semi-supervised NMF. This means that parts of the first NMF factor are predefined

as a set of spectra characteristic for the sources to be separated rather than choosing

random initialisations of both factors. This can be useful in audio enhancement, e.g.,

in a 'cocktail party' situation with several simultaneous speakers [ 6 , 17 ], or noise

versus a speaker of interest [ 18 ]. The initialisation spectra may themselves stem

from NMF decomposition of training material or can be based on simpler methods

such as median filtering or simply random sampling of training spectrograms. This

procedure is outlined in Fig. 8.1 as a flowchart. An alternative supervised NMF

method, depicted in Fig. 8.2 , is to assign components computed by unsupervised

NMF to classes such as 'drums' and 'non-drums' by means of a supervisedly trained

classifier as in [ 19 ]. This allows dealing with observations that cannot be described as

a linear combination of pre-defined spectra, but assumes that unsupervised NMF by

itself can extract meaningful units, such as notes of different instruments. Given an

assignment of NMF components to sources as described above, it is straightforward

to synthesise the audio signals of interest by overlaying component spectrograms.

Fig. 8.1 Supervised NMF: A set of spectral components (which can themselves be computed by

NMF from training audio) serve as constant basis for NMF; the activations can be exported as

features or be used to synthesise audio signals for the sources [ 12 ]

Next Page

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home