Feature Compensation Employing Variational Model Composition for Robust Speech Recognition in In-Vehicle Environment - Digital Signal Processing for In-Vehicle Systems and Safety

Digital Signal Processing Reference

In-Depth Information

Fig. 11.3 Block diagram of the PCGMM method employing the proposed variational model

composition

where r e;k is a constant bias term from the k th Gaussian component of the e th

environment model, and p

is the posterior probability for environment G e .

The variational noise models obtained by the proposed variational model com-

position method in this study are used to generate the environmental models { G e },

which are estimated through the model combination procedure using the clean

speech GMM and the obtained variational noise models. A uniform prior probabil-

ity is set on all obtained noise models in this study. Figure 11.3 demonstrates the

resulting block diagram of PCGMM-based feature compensation employing the

proposed new variational model composition method.

ð

k

j

G e ;

y t Þ

11.5 Experimental Results

As test data for performance evaluation, connected single digits portions from CU-

Move corpus were selected. We established an experiment setup which is identical

to the Aurora2 evaluation framework [ 16 ]. The task is connected English-digits

consisting of 11 words. Each whole word is represented by a continuous density

HMM with 16 states and three mixtures per state. In addition to the digits, two

silence models (i.e., normal silence and short pause) are used.

The feature extraction algorithm suggested by the European Telecommunication

Standards Institute (ETSI) was employed for the experiments [ 17 ]. The zeroth

cepstral coefficient was used instead of log energy for the sake of convenience in

model combination implementation. After extracting the 13th order cepstrum, the

first and second order time derivatives are included during the decoding procedure

(a total of 39 dimensional feature vector). The HMM parameters were estimated

using 8,840 clean speech training samples included in Aurora2, and performance

was evaluated on the selected test set of CU-Move corpus. The test set consists

of 464 utterances (length of 50 min) spoken by ten different speakers (five males

Search WWH ::

Custom Search

Home