A SPOKEN DIALOG CORPUS FOR CAR TELEMATICS SERVICES - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

1.

INTRODUCTION

The term telematics refers to the emerging industry of communication,

information, and entertainment services delivered to motor vehicles via

wireless network technology. A telematics system must provide a human-

machine interface (HMI) that allows drivers to operate the device, system or

service easily and without any risks regarding traffic safety. A spoken dialog

system is considered to be the most suitable HMI for telematics, since it

allows the driver to keep “hands on the wheel, eyes on the road”.

The Conversational Agent for Multimedia Mobile Information Access

(CAMMIA) provides a framework for client-server implementation of spoken

dialog systems in mobile, hands-free environments[1][5]. The goal of

CAMMIA is to realize large-scale speech dialog systems that can handle a

variety of information retrieval tasks. CAMMIA is based on VoiceXML, a

markup language for speech dialog systems which has been proposed as a

standard by W3C [7]. The client is an in-vehicle terminal with an automatic

speech recognition (ASR) system, a VoiceXML interpreter, and a text-to-

speech (TTS) system; the server is a separate computer which runs a Dialog

Manager (DM) module [5]. The client recognizes the driver's utterances

according to the VoiceXML dialog scenarios, and transmits the recognition

results in the form of requests to the server. The server then searches its

database and transforms the search results into VoiceXML files which are

transmitted to the client as a response.

One novel aspect of CAMMIA is the natural conversational interaction

between the user and the system, supported by a DM module that allows the

user to change dialog tasks flexibly. Many of the system requirements

associated with natural spoken dialog can be ascertained by studying human

behavior as observed in large collections of spoken or written data.

Specifically, the analysis includes defining a lexicon and grammar for ASR,

as well as designing suitable dialog scenarios for use by the DM.

Human-computer dialog differs from human-human dialog in various

aspects, including linguistic complexity[2]. However, the examination of

human-human dialogs is a natural first step in the process of modeling human

dialog behavior [3]. The modeling approach requires very large quantities of

task-oriented linguistic data. To meet this requirement, we collected a spoken

dialog corpus for car telematics services. In this Chapter, Section 2 outlines

the system architecture of CAMMIA. Section 3 explains the spoken dialog

corpus collection. Section 4 describes the analysis of the corpus, followed by

conclusions.

Search WWH ::

Custom Search

Home