Digital Signal Processing Reference
In-Depth Information
Table 10.4 Mean isolated digit recognition rates (in [%] WA / UA) of a HMM recogniser without
feature enhancement for different noise types and training strategies: matched conditions training
(MC), mismatched conditions training (MMC), and training with clean data
Training
Clean
CAR
BAB
WGN
Clean data
99.92
75.09
88.37
63.67
MMC
79.42
96.86
98.74
68.51
MC
99.92
99.69
99.73
99.22
Table 10.4 summarises the WA for a HMM recogniser without feature
enhancement for three different training strategies: training on clean data, mis-
matched conditions training, and matched conditions training. In these experiments,
mismatched condition training denotes training and testing with the same noise type
but at unequal noise conditions (SNR levels and driving conditions, respectively).
Matched conditions training stands for exactly identical noise types and noise con-
ditions. If the test sequence is disturbed by noise, mismatched conditions training
outperforms a recogniser that had been trained on clean data. However, for clean test
sequences the mismatched conditions training significantly downgrades recognition
rates, as the noise pattern that had been learnt during the training is missing when
testing the recogniser. The results for matched conditions training serve as an upper
benchmark for noisy speech recognition performance, because by this strategy one
assumes perfect knowledge of the noise properties. Note that, since in the matched
conditions experiment one model was trained for every noise condition, this implies
knowledge of the noise characteristics and higher memory requirements, as more
than one model has to be stored.
The best MFCC feature enhancement methods were further applied in the spelling
recognition task as shown in Table 10.5 . Again, for noisy test data, SLDM perform
better than more 'conventional' techniques such as HEQ.
10.1.3 Summary
In this section evaluation results for the different techniques to improve the perfor-
mance of ASR in noisy surroundings as were introduced in Chap. 9 were presented
for the noisy isolated digit and spelling recognition task. These techniques affect
feature extraction, feature enhancement, speech de-coding, and speech modelling.
Table 10.5 Mean spelling
recognition rates for different
noise types and noise
compensation strategies,
training on clean data
Strategy
Features
Clean
WA [%]
CAR
BAB
WGN
SLDM
MFCC
92.73
82.98
81.59
64.23
HEQ
MFCC
91.85
70.19
69.40
48.20
CMS
MFCC
93.09
73.79
69.78
47.06
none
MFCC
91.04
58.82
66.92
44.30
 
 
Search WWH ::




Custom Search