Information Technology Reference
In-Depth Information
terms of possible word sequences. Within an MMD system framework,
building this model requires us to aptly choose the corpus used for statistical
calculations: it is not necessarily relevant to only keep oral dialogue
transcriptions, but having dialogues close to those expected by the system is a
definite advantage, even if various language models have to be managed.
Moreover, alternating the user and the system's interventions brings us an
additional limitation: the probabilities for a user's utterance depend on what
the system just said. The language models thus have to take into account the
state of the dialogue, and become more and more difficult to manage.
There is an additional issue compared to vocal dictation: if the result
consists of a written text matching what has been said, the speech recognition
module result in an MMD system can be much more detailed. First, it can
include various recognition hypotheses, so that the following modules make a
choice depending on their own expectations. When an utterance includes an
unknown word, i.e. a sequence of phonemes that do not match any of the
words in the lexicon, the recognition module has a choice between various
solutions: either bring it back to one of the words of the lexicon, even if the
pronunciations are vastly different, or try to transcribe the sequence of
phonemes with a potential spelling depending on the languages. While these
two solutions might be acceptable for speech dictation, the second, for
example, perfectly adapted to transcribing surnames that the system does not
recognize, it is not the case for MMD: not only does the recognition module
have to indicate that it is an unknown word, but it also has to transmit a code
describing the word's pronunciation, so that the system can add it to its
vocabulary and pronounce it in turn, if only to ask the user what it means. To
get the job done, each recognized word is given a confidence score, and the
syntactic or semantic analyzer uses these confidence scores and its own
preferences to find (rather than have imposed) the most plausible transcription
of the utterance.
An additional aspect with consequences on the nature of the result
transmitted to the other modules of the MMD system is found in the prosody.
Whether one is talking of the role of the recognition module or of another
specific module, it is useful for the written transcription of the utterance to be
accompanied by coding, by a transcription of the prosody. We will see in
Chapters 5, 6 and 7, that prosody helps in semantic analysis (by providing
focalization clues), in solving references when a gesture is used jointly with a
referential expression and in identifying speech acts, by providing a tone
Search WWH ::




Custom Search