Digital Signal Processing Reference
In-Depth Information
1.
INTRODUCTION
Spoken dialog management in car telematics is a challenging topic for
speech and language technology research. The various challenges include
efficient creation of dialog scenarios, accurate analysis of the user's
utterances, and management of the communication between the client and the
server. Because of strict limitations on computational resources and
communication bandwidth, the client and the server systems need to divide
their tasks in an appropriate manner. VoiceXML[l] provides a useful basis
for the design of a system architecture where the server system provides the
minimal information necessary to guide the dialog, and the client system
transmits the minimal information necessary to describe the user's input.
Carpenter, et al.[2] have proposed a framework for server-side dialog
management; since VoiceXML does not directly support the modeling of
dialogs as state-transition networks, their framework assumes that the dialog
manager[3] controls the entire flow of the dialog, and sends small segments of
VoiceXML (representing single dialog turns) to the client. However, for
mobile applications such as car telematics systems, the communication
channel is narrow and unstable, and we therefore prefer to send the client a
single VoiceXML document that includes several dialog turns. In previous
work, we have proposed two extensions to VoiceXML: DialogXML and
ScenarioXML[4]. DialogXML supports a higher-level model of dialog flow
using states and transitions; ScenarioXML provides a systematic mechanism
for smooth transition between multiple active dialogs, along with access to
external databases and information sources. These extensions are essential for
dialog management in car telematics systems. The ScenarioXML dialogs
written by the developer are compiled into VoiceXML documents that can be
interpreted by the client. On the server side, following the work reported in
[2] and [4], Java Server Pages (JSP)[5] are used by the dialog manager to
create VoiceXML documents dynamically, so that a particular application
may also incorporate information from external databases which is accessed
in real time.
Another challenge for an in-vehicle dialog system is accurate analysis of
the user's utterances. In a system that has rich computational and/or
communication resources, such as a telephony gateway, a large-vocabulary
continuous speech recognition (LVCSR) system (e.g., SPHINX[6]) and a
large scale natural language processing (NLP) system (e.g., KANTOO[7])
can be integrated. However, in a client system with limited resources, the
complexity of the analysis algorithms must be simplified. Our system uses a
simple speech recognizer with a regular grammar, and a set of small
grammars and lexicons for NLP processing. A (grammar, lexicon) pair
Search WWH ::




Custom Search