EXPERIENCES OF MULTI-SPEAKER DIALOGUE SYSTEM FOR VEHICULAR INFORMATION RETRIEVAL - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

1.

INTRODUCTION: FROM SINGLE-SPEAKER TO

MULTI-SPEAKER DIALOGUE SYSTEM

It has been several decades since the development and release of the first

spoken dialogue system (SDS). Although SDS's can provide convenient

human-computer interface (HCI) and many useful functions, most current

SDS's address only the interaction between the system and one speaker. In

some situations, it is natural and necessary to be able to handle the interaction

between multiple speakers and the system. For example, if several passengers

in a car are determining where to go for lunch, traditional SDSs would need to

be improved in order to deal with the multiple speaker interaction. This

motivates the present investigation into the study of multi-speaker dialogue

systems (MSDS).

There are many factors to be considered when multiple parties are

engaged in an HCI system. Studies of HCI systems that involve multiple

users are in their initial stages and any papers, lectures or studies on the

subject are very limited. Among the reported studies, Young developed the

discourse structure for multi-speaker spoken dialogs based on stochastic

model [1]. Bull and Aylett [2] analyzed the timing of turn-taking in dialogues;

cross-speaker anaphora was reported by Poesio [3]. This research was based

on theoretical studies or the analyses of tagged text-based multi-speaker

interactions. Similar papers can be found [4,5,6,7]. Besides these theoretical

studies, Matsusaka et al. [8] built a robot that could communicate with multi-

users using a multi-modal interface. The robot was equipped with several

workstations and cameras to track and process the speaker input. So, in all,

previous multi-speaker research [1,2,3,4,5,6,7] either focused on the

theoretical discussion of dialogues or required additional expensive

heterogeneous hardware for multi-modal input, as reported in [8,9,10,11,12].

The issue that previous research has failed to analyze is the interactions

between a dialogue system and speakers. This chapter focuses on the analysis

of such interactions and proposes an algorithm for dialogue manager to

handle various interactions occurring in an MSDS. Note that two kinds of

interaction may occur in a multi-speaker dialogue, as classified below. The

first one is the interaction between a speaker and the system (referred to as

inter-action), and the other is the interaction between speakers (intra-action).

This chapter discusses only the former.

Observation of many multi-speaker interactions lead to the conclusion that

during a dialogue, one speaker may either interrupt the utterance of another

speaker or wait until the input is finished. That is, the speakers are either

making simultaneous input or they utter the input in turn. If an MSDS can

handle simultaneous speech

inputs, we call it a simultaneous MSDS

Search WWH ::

Custom Search

Home