Digital Signal Processing Reference
In-Depth Information
Fig. 8.2 Unsupervised NMF followed by supervised component classification, as inmusical instru-
ment separation: A classifier is built from labelled separated components. Steps required to train
the classifier are gray shaded [ 12 ]
Besides using source separation as pre-processing for Intelligent Audio Analysis, the
activations computed by NMF can be used directly for classification, as indicated by
the flowchart in Fig. 8.1 . This approach will be presented in more detail in Sect. 8.3 .
8.2 Performance
To get an idea of the separation performance by basic NMF in a challenging task, let
us consider the separation of two simultaneously speaking speakers from a monaural
signal in the ongoing. Fig. 8.3 visualises the separation quality in terms of source-
distortion ratio (SDR) depending on the targetedRTF. SDR, as introduced by [ 20 ], can
be considered as the most popular objective metric for the evaluation of audio source
separation as of today. In the considered scenario of speaker separation, it takes into
account the suppression of the interfering speaker but also penalizes the introduction
of artifacts due to signal separation, i.e., information loss in the target speech—note
that perfect interference reduction can be trivially achieved by outputting a zero
signal. These experiments are based on the procedure proposed in [ 6 ] and the results
correspond to those reported in [ 12 ]. NMF is used over NMD based on the finding
in [ 6 ] of no significant difference in separation quality by either of these two bases.
The effect of using different numbers of iterations, DFT window sizes and the NMF
cost function is assessed; the importance of these parameters on separation quality
and computational complexity has been pointed out in [ 6 , 12 ]. 12 pairs of male
and female speakers—ensuring that the speech spectra do not fully overlap—were
selected randomly from the TIMIT database (cf. also Sect. 10.4.3 ) . Per pair, two
 
Search WWH ::




Custom Search