Digital Signal Processing Reference
In-Depth Information
In an MSDS, the interactions between the speakers and the system should
be handled carefully to keep the dialogue going smoothly. This task is often
accomplished by the dialogue manager and is the major issue discussed in this
chapter. This chapter is organized as follows: 1) Section 2 describes the major
components of an MSDS; 2) Section 3 illustrates the algorithm of a multi-
speaker dialogue manager, together with several examples; 3) Section 4
shows the experimental results; finally, the concluding remarks are given in
Section 5.
2.
FUNDAMENTAL OF MSDS
According to the model provided by Huang et al., [13], a traditional
single-speaker SDS can be modeled as a pattern recognition problem. Given a
speech input X, the objective of the system is to arrive at actions A (including
a response message and necessary operations) so that the probability of
choosing A is maximized. The optimal solution, i.e., the maximum a posterior
(MAP) estimation, can be expressed as following equation:
where F denotes the semantic interpretation of X and the discourse
semantics for the n th dialogue turn. Note that Eq. (1) shows the model-base
decomposition of an SDS. The probabilistic model of an SDS can be found in
the work of Young [14, 15].
For the case of multi-speaker dialogue system, assuming that only single-
thread speech input is allowed, and speech is input from multiple microphone
channels, Eq. (1) can be extended to the formulation below.
where denotes the integration of m discourse semantics for the n th
dialogue turn, it contains all the information in
And, m is the number of
speakers. The discourse semantics
can be derived using Eq.(3) shown
below:
Search WWH ::




Custom Search