CU-MOVE: ADVANCED IN-VEHICLE SPEECH SYSTEMS FOR ROUTE NAVIGATION - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

include speaker changes (task stress, emotion, Lombard effect, etc.)[16,31] as

well as the acoustic environment (road/wind noise from windows, air

conditioning, engine noise, exterior traffic, etc.).

Recent approaches to speech recognition in car environments have

included combinations of basic HMM recognizers with front-end noise

suppression[2,4], environmental noise adaptation, and multi-channel

concepts. Many early approaches to speech recognition in the car focused on

isolated commands. One study considered a command word scenario in car

environments where an HMM was compared to a hidden Neural Network

based recognizer[5]. Another method showed an improvement in

computational requirements with front-end signal-subspace enhancement

used a DCT in place of a KLT to better map speech features, with recognition

rates increasing by 3-5% depending on driving conditions[6]. Another

study[7] considered experiments to determine the impact of mismatch

between recognizer training and testing using clean data, clean data with car

noise added, and actual noisy car data. The results showed that starting with

simulated noisy environment train models, about twice as much adaptation

material is needed compared with starting with clean reference models. The

work was later extended[8] to consider unsupervised online adaptation using

previously formulated MLLR and MAP techniques. Endpoint detection of

phrases for speech recognition in car environments has also been

considered[9]. Preliminary speech/noise detection with front-end speech

enhancement methods as noise suppression front-ends for robust speech

recognition have also shown promise[2,4,10,11]. Recent work has also been

devoted to speech data collection in car environments including

SpeechDat.Car[12], and others [13]. These data concentrate primarily on

isolated command words, city names, digits, etc. and typically do not include

spontaneous speech for truly interactive dialogue systems. While speech

recognition efforts in car environments generally focus on isolated word

systems for command and control, there has been some work on developing

more spontaneous speech based systems for car navigation [14,15], however

these studies use a head-worn and ceiling mounted microphones for speech

collection and limit the degree of naturalness (i.e., level of scripting) for

navigation information exchange.

In developing CU-Move, there are a number of research challenges which

must be addressed to achieve reliable and natural voice interaction within the

car environment. Since the speaker is performing a task (driving the vehicle),

a measured level of user task stress will be experienced by the driver and

Search WWH ::

Custom Search

Home