Digital Signal Processing Reference
In-Depth Information
factor. This is typically true for in-vehicle speech systems which face the problem of
robust speech recognition in order to address a range of severe changing background
noise conditions.
To minimize this mismatch, extensive research has been conducted in recent
decades with the goal of achieving successful results for slowly changing back-
ground noise, including many types of speech/feature enhancement methods and
model adaptation techniques [ 1 - 10 ]. However, these methods continue to suffer
from ineffectiveness in time-varying background noise conditions, where the noise
characteristics need to be effectively estimated as time elapses. Recently, missing-
feature methods have shown promising results [ 11 , 12 ], which utilize no prior
knowledge of the background noise [ 13 ]. Unfortunately, they are highly dependent
on the ability of reliable component estimation, still resulting in performance
degradation in time-varying noise conditions.
In this study, a novel model composition method is proposed to address time-
varying background noise such as in-vehicle environments for improved speech
recognition. Our motivation is that each order of the cepstral coefficients represents
a frequency degree of the changing components in the log-spectrum envelope [ 14 ].
In the proposed method, variational noise models are generated by selectively
applying perturbation factors to a basis model in the cepstral domain to obtain
various types of spectral patterns. The proposed variational model composition
method is employed to generate multiple environmental models for our previously
proposed feature compensation method [ 9 , 10 ]. The proposed method will be
evaluated on the CU-Move corpus which contains a range of acoustic signals
expected to be observed during real-life car driving.
This chapter is organized as follows. We first review the CU-Move [ 15 ] corpus
used for this study in Sect. 11.2 . In Sect. 11.3 , the motivation of the proposed
variational model composition method is presented and the detailed procedure
described. A multiple model-based feature compensation method as an application
of the proposed study is presented in Sect. 11.4 , which has been developed in our
previous study. Representative experimental results are presented and discussed in
Sect. 11.5. Finally, in Sect. 11.6 , we conclude our work.
11.2 CU-Move Corpus
The CU-Move project [ 15 ] was designed to develop reliable car navigation systems
employing a mixed initiative dialog. This requires robust speech recognition across
changing acoustic conditions. The CU-Move database consists of five parts: (1)
command and control words, (2) digit strings of telephone and credit card numbers,
(3) street names and addresses, (4) phonetically balanced sentences, and (5) Wizard
of Oz interactive navigation conversations. A total of 500 speakers, balanced across
gender and age, produced over 600 GB of data during a 6-month collection effort
across the United States. The database and noise conditions are discussed in detail
in [ 15 ]. We point out that the noise conditions are changing with time and are quite
different in terms of SNR, stationarity, and spectral structure. The challenge in
Search WWH ::




Custom Search