Digital Signal Processing Reference
In-Depth Information
where U denotes the multiple input from the multiple microphones and i is the
speaker index. Based on Eq. (3), an MSDS can be decomposed into five
components as described below:
1.
Active speaker determination: deciding the active speaker i and their
speech input using model In order to aid in the determination
of the active speaker along with multiple microphone input, the matched
filter can be a useful technique. The output of the matched filter from each
microphone is compared with a predetermined threshold to decide the
primary channel, i.e., the active speaker. The signals from secondary
channels are used to estimate the noise using an adaptive filter. The
enhanced signal (i.e., the target speech) is obtained by subtracting the
estimated noise from the primary channel signal.
2.
Individual semantic parser: performing the same parsing process, as in the
case of traditional SDS, for each speaker. The semantic model
to parse sentence into semantic objects This component is often
divided into individual target speech recognition and sentence parsing.
The speech recognizer translates each speaker's utterance into a
word/keyword lattice. Current development of keyword spotters allows
them the ability to detect thousands of keywords and yields acceptable
results for the applications of SDS. It would be suitable to make use of a
keyword spotter in an MSDS in order to detect the meaningful part of a
speaker utterance. Our proposed MSDS uses the technique developed by
Wu and Chen [16]. Furthermore, we adopt the partial parser which
concentrates on describing the structure of the meaningful clauses and
sentences that are embedded in the spoken utterance.
3.
Individual discourse analysis: the discourse model
is used
to derive new dialogue context
This process is also performed for each
speaker.
4.
Multiple discourse integration: the discourses semantics of all speakers
are integrated using model The discourse
integration model together with the individual discourse analysis model
combines and integrates each speaker's dialogue semantics. The result of
discourse integration is sent to the multi-speaker dialogue manager.
5.
Multi-speaker dialogue manager: to determine the most suitable action by
the model After multi-speaker speech input is handled properly
by these modules, the dialogue manager is responsible for maintaining the
Search WWH ::




Custom Search