Digital Signal Processing Reference
In-Depth Information
1
1
1
Fig. 8.4 Exemplary block diagram for extraction of NMF activation features for discrimination of
C classes in N input signals [ 15 ]. Matrices denoted by V are spectrograms. The matrix W consists
of spectra computed from training data for supervised NMF. Activation features are the resulting
H (activation) matrices.
|| · || 2 indicates that the Euclidean norm of each matrix row is computed,
and =
1 is a normalisation for the components of each vector a i sum up to 1
associated with word likelihoods, the index of the most likely word per frame can be
computed from the frame-wise activations of each spectrogram patch and used as a
discrete feature. In this calculation of NMF activation features, W N is pre-defined by
training noise samples. Table 8.1 shows the WAs on the 35 keywords by SNR and on
average, obtained by a baseline HMM recogniser adapted to noisy speech features,
the results achieved by considering NMD speech separation as pre-processing, the
results by usage of NMF activation features in HMM decoding, and combination
of both. From the results, it is evident that both methods are complementary—the
interested reader is referred to [ 21 ] for a more in-depth discussion.
Table 8.1 Effect of NMD speech separation and NMF activation features on speech recognition
results (WA) reported in [ 21 ] on the Computational Hearing inMultisource Environments (CHiME)
task [ 22 ]
WA [%]
SNR [dB]
Average
6
3
0
3
6
9
Baseline
54.5
61.1
72.8
81.7
86.8
91.3
74.7
NMD speech separation
75.6
79.2
84.1
87.7
88.3
90.6
84.2
NMF activation features
67.2
75.1
85.0
89.8
92.0
93.4
83.7
Combination
79.1
82.8
88.7
91.2
92.7
93.5
88.0
 
Search WWH ::




Custom Search