Information Technology Reference
In-Depth Information
In the 1980s, the architectures often followed the order of processing that
we have already mentioned: speech recognition, syntactic analysis, semantic
analysis, dialogue management, automatic generation and text to speech. The
variations often focus on the way to manage data. Thus, Pierrel [PIE 87]
presents a set of static data and dynamic data. The first set covers a subset of
models that we have listed at the end of Chapter 3. The second set can be
reduced to the dialogue history and the user model, essentially turned toward
useful voice recognition settings: individual acoustic models, settings on the
way to pronounce links between words or on prosodic outlines. For some
tasks, the user model also includes access rights and control rights to the
application objects.
Today, we find in most of the systems some equivalent to these modules,
with more modules devoted to a specific modality or the management of a
specific device. But as Cole [COL 98, p. 198] states the dialogue manager is
always at the heart of the system. The dialogue manager manages the
dialogue history, which records the occurrence of speech turns as the dialogue
progresses, the utterances pronounced, their linguistic characteristics,
especially the referential expressions used and referents mentioned, so as to
find the necessary information when solving a new reference, a new anaphora
or a nominal or verbal ellipsis. Moreover, the dialogue history also stores the
task's state, the stage achieved in the current dialogue strategy or a description
of the successes and failures in communicating.
More than a resource affected to a module, as was the case in the 1980s,
the dialogue history from now on can be a fully-fledged module, with access
and storage procedures. Indeed, any kind of process, such as syntactic
analysis, can theoretically call upon it. Our comments in section 2.1.1 on how
to apply a forgetting process to MMD go in this direction. This principle of
access and storage at any point is generalized in the current systems,
especially if we try and get closer to real-time operations, that is with
analyses carried out during user utterances. Thus, the module in charge of
recording the audio signal stores the signal in real time is an utterance
resource, and, still in real time, the speech recognition modules and prosodic
analysis modules update this resource by adding one or more transcription
hypotheses and mentioning on a dedicated label each moment where the
sentence can be considered self-sufficient. The module in charge of detecting
the end of the utterance also helps itself in real time to this utterance resource,
and indicates that the system can start talking when the prosodic and syntactic
Search WWH ::




Custom Search