Digital Signal Processing Reference
In-Depth Information
include speaker changes (task stress, emotion, Lombard effect, etc.)[16,31] as
well as the acoustic environment (road/wind noise from windows, air
conditioning, engine noise, exterior traffic, etc.).
Recent approaches to speech recognition in car environments have
included combinations of basic HMM recognizers with front-end noise
suppression[2,4], environmental noise adaptation, and multi-channel
concepts. Many early approaches to speech recognition in the car focused on
isolated commands. One study considered a command word scenario in car
environments where an HMM was compared to a hidden Neural Network
based recognizer[5]. Another method showed an improvement in
computational requirements with front-end signal-subspace enhancement
used a DCT in place of a KLT to better map speech features, with recognition
rates increasing by 3-5% depending on driving conditions[6]. Another
study[7] considered experiments to determine the impact of mismatch
between recognizer training and testing using clean data, clean data with car
noise added, and actual noisy car data. The results showed that starting with
simulated noisy environment train models, about twice as much adaptation
material is needed compared with starting with clean reference models. The
work was later extended[8] to consider unsupervised online adaptation using
previously formulated MLLR and MAP techniques. Endpoint detection of
phrases for speech recognition in car environments has also been
considered[9]. Preliminary speech/noise detection with front-end speech
enhancement methods as noise suppression front-ends for robust speech
recognition have also shown promise[2,4,10,11]. Recent work has also been
devoted to speech data collection in car environments including
SpeechDat.Car[12], and others [13]. These data concentrate primarily on
isolated command words, city names, digits, etc. and typically do not include
spontaneous speech for truly interactive dialogue systems. While speech
recognition efforts in car environments generally focus on isolated word
systems for command and control, there has been some work on developing
more spontaneous speech based systems for car navigation [14,15], however
these studies use a head-worn and ceiling mounted microphones for speech
collection and limit the degree of naturalness (i.e., level of scripting) for
navigation information exchange.
In developing CU-Move, there are a number of research challenges which
must be addressed to achieve reliable and natural voice interaction within the
car environment. Since the speaker is performing a task (driving the vehicle),
a measured level of user task stress will be experienced by the driver and
Search WWH ::




Custom Search