Digital Signal Processing Reference
In-Depth Information
classification error type by a term which is conditioned on the importance that
error type plays in the subsequent speech application employing
Environmental Sniffing. In [32], we specialized the formulation of CPR to a
specific case where Environmental Sniffing framework is used for model
selection within an ASR system. The Environmental Sniffing framework
determines the initial acoustic model to be used according to the
environmental knowledge it extracts. The knowledge in this context, will
consist of the acoustic condition types with time tags. For this task, we can
formulate the Critical Performance Rate as:
where denotes the transposed error matrix for noise classification, and C is
the normalized cost matrix. Since some noise conditions occur more
frequently than others, each noise condition will have an a priori probability
denoted as a. Each cost value is proportional with WER difference between
the matched case and the mismatched case, which is the performance
deviation of the ASR engine by using the wrong acoustic model during
decoding instead of using the correct acoustic model. The goal, in terms of
performance, is to optimize the critical performance rate rather than
optimizing the environmental noise classification performance rate, since it is
more important to detect and classify noise conditions that have a more
significant impact on ASR performance.
In our evaluations, we degraded the TI-DIGIT database at random SNR
values ranging from -5 dB to +5 dB (i.e., -5,-3,-1,+1,+3,+5 dB SNR) with 8
different in-vehicle noise conditions using the noise database from [24]. A
2.5-hour noise data set was used to degrade the training set of 4000
utterances, and the 0.5 hour set was used to degrade the test set of 500
utterances (i.e., open noise degrading condition). Each digit utterance was
degraded with only one acoustic noise condition.
Using the sniffing framework presented in Figure 2-6, each utterance was
assigned to an acoustic condition. Using the fact that there was only one
acoustic condition within each utterance, the Environmental Sniffing
framework did not allow noise transitions within an utterance. A noise
classification rate of 82% was obtained. Environmental condition specific
acoustic models were trained and used during recognition tests. The Cost
matrix C is calculated by testing different acoustic conditions using different
Search WWH ::




Custom Search