EXPERIENCES OF MULTI-SPEAKER DIALOGUE SYSTEM FOR VEHICULAR INFORMATION RETRIEVAL - DSP for In-Vehicle and Mobile Systems

Digital Signal Processing Reference

In-Depth Information

In an MSDS, the interactions between the speakers and the system should

be handled carefully to keep the dialogue going smoothly. This task is often

accomplished by the dialogue manager and is the major issue discussed in this

chapter. This chapter is organized as follows: 1) Section 2 describes the major

components of an MSDS; 2) Section 3 illustrates the algorithm of a multi-

speaker dialogue manager, together with several examples; 3) Section 4

shows the experimental results; finally, the concluding remarks are given in

Section 5.

2.

FUNDAMENTAL OF MSDS

According to the model provided by Huang et al., [13], a traditional

single-speaker SDS can be modeled as a pattern recognition problem. Given a

speech input X, the objective of the system is to arrive at actions A (including

a response message and necessary operations) so that the probability of

choosing A is maximized. The optimal solution, i.e., the maximum a posterior

(MAP) estimation, can be expressed as following equation:

where F denotes the semantic interpretation of X and the discourse

semantics for the n th dialogue turn. Note that Eq. (1) shows the model-base

decomposition of an SDS. The probabilistic model of an SDS can be found in

the work of Young [14, 15].

For the case of multi-speaker dialogue system, assuming that only single-

thread speech input is allowed, and speech is input from multiple microphone

channels, Eq. (1) can be extended to the formulation below.

where denotes the integration of m discourse semantics for the n th

dialogue turn, it contains all the information in

And, m is the number of

speakers. The discourse semantics

can be derived using Eq.(3) shown

below:

Search WWH ::

Custom Search

Home