Digital Signal Processing Reference
In-Depth Information
1.
INTRODUCTION: FROM SINGLE-SPEAKER TO
MULTI-SPEAKER DIALOGUE SYSTEM
It has been several decades since the development and release of the first
spoken dialogue system (SDS). Although SDS's can provide convenient
human-computer interface (HCI) and many useful functions, most current
SDS's address only the interaction between the system and one speaker. In
some situations, it is natural and necessary to be able to handle the interaction
between multiple speakers and the system. For example, if several passengers
in a car are determining where to go for lunch, traditional SDSs would need to
be improved in order to deal with the multiple speaker interaction. This
motivates the present investigation into the study of multi-speaker dialogue
systems (MSDS).
There are many factors to be considered when multiple parties are
engaged in an HCI system. Studies of HCI systems that involve multiple
users are in their initial stages and any papers, lectures or studies on the
subject are very limited. Among the reported studies, Young developed the
discourse structure for multi-speaker spoken dialogs based on stochastic
model [1]. Bull and Aylett [2] analyzed the timing of turn-taking in dialogues;
cross-speaker anaphora was reported by Poesio [3]. This research was based
on theoretical studies or the analyses of tagged text-based multi-speaker
interactions. Similar papers can be found [4,5,6,7]. Besides these theoretical
studies, Matsusaka et al. [8] built a robot that could communicate with multi-
users using a multi-modal interface. The robot was equipped with several
workstations and cameras to track and process the speaker input. So, in all,
previous multi-speaker research [1,2,3,4,5,6,7] either focused on the
theoretical discussion of dialogues or required additional expensive
heterogeneous hardware for multi-modal input, as reported in [8,9,10,11,12].
The issue that previous research has failed to analyze is the interactions
between a dialogue system and speakers. This chapter focuses on the analysis
of such interactions and proposes an algorithm for dialogue manager to
handle various interactions occurring in an MSDS. Note that two kinds of
interaction may occur in a multi-speaker dialogue, as classified below. The
first one is the interaction between a speaker and the system (referred to as
inter-action), and the other is the interaction between speakers (intra-action).
This chapter discusses only the former.
Observation of many multi-speaker interactions lead to the conclusion that
during a dialogue, one speaker may either interrupt the utterance of another
speaker or wait until the input is finished. That is, the speakers are either
making simultaneous input or they utter the input in turn. If an MSDS can
handle simultaneous speech
inputs, we call it a simultaneous MSDS
Search WWH ::




Custom Search