ROBUST DIALOG MANAGEMENT ARCHITECTURE USING VOICEXML FOR CAR TELEMATICS SYSTEMS - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

1.

INTRODUCTION

Spoken dialog management in car telematics is a challenging topic for

speech and language technology research. The various challenges include

efficient creation of dialog scenarios, accurate analysis of the user's

utterances, and management of the communication between the client and the

server. Because of strict limitations on computational resources and

communication bandwidth, the client and the server systems need to divide

their tasks in an appropriate manner. VoiceXML[l] provides a useful basis

for the design of a system architecture where the server system provides the

minimal information necessary to guide the dialog, and the client system

transmits the minimal information necessary to describe the user's input.

Carpenter, et al.[2] have proposed a framework for server-side dialog

management; since VoiceXML does not directly support the modeling of

dialogs as state-transition networks, their framework assumes that the dialog

manager[3] controls the entire flow of the dialog, and sends small segments of

VoiceXML (representing single dialog turns) to the client. However, for

mobile applications such as car telematics systems, the communication

channel is narrow and unstable, and we therefore prefer to send the client a

single VoiceXML document that includes several dialog turns. In previous

work, we have proposed two extensions to VoiceXML: DialogXML and

ScenarioXML[4]. DialogXML supports a higher-level model of dialog flow

using states and transitions; ScenarioXML provides a systematic mechanism

for smooth transition between multiple active dialogs, along with access to

external databases and information sources. These extensions are essential for

dialog management in car telematics systems. The ScenarioXML dialogs

written by the developer are compiled into VoiceXML documents that can be

interpreted by the client. On the server side, following the work reported in

[2] and [4], Java Server Pages (JSP)[5] are used by the dialog manager to

create VoiceXML documents dynamically, so that a particular application

may also incorporate information from external databases which is accessed

in real time.

Another challenge for an in-vehicle dialog system is accurate analysis of

the user's utterances. In a system that has rich computational and/or

communication resources, such as a telephony gateway, a large-vocabulary

continuous speech recognition (LVCSR) system (e.g., SPHINX[6]) and a

large scale natural language processing (NLP) system (e.g., KANTOO[7])

can be integrated. However, in a client system with limited resources, the

complexity of the analysis algorithms must be simplified. Our system uses a

simple speech recognizer with a regular grammar, and a set of small

grammars and lexicons for NLP processing. A (grammar, lexicon) pair

Search WWH ::

Custom Search

Home