Information Technology Reference
In-Depth Information
The process to carry this out is that of a multimodal fusion at a pragmatic
level: both acts, that of the gesture and that of speech, are confronted and
unified to obtain a single act characterizing the complete multimodal
utterance. First, when the two acts present are of the same type, for example
two assertions or two questions, multimodal fusion essentially consists of
checking the compatibility of the semantic contents. In the example given
previously, wide-opened eyes do not carry a specific semantic content when
this gesture happens at the same time as an oral utterance. The fusion is
therefore immediate when the oral utterance is of the “asking” type. It is the
same thing for a sudden gesture which illustrates, in an injunctive way, the
order also transmitted through a simultaneous oral utterance. Fusion can be
less immediate with two assertions: in that case, the gesture carries semantic
content, for example that of a specific quasi-linguistic aspect. Either this
content is compatible with that of the assertion stated through speech and the
multimodal fusion obtains a single assertion act with a unified semantic
content, or the two semantic contents are not compatible and the system is
faced with two different acts, also known as a multimodal composite act.
Finally, when the two acts present are of different types, for example a
querying gesture with an oral utterance of the “saying that” type, there is
more than one possible case. Either the semantic contents can fuse, and the
MMD system can then emit the hypothesis of an multimodal indirect act: the
linguistic utterance looks like an assertion, but taking the gesture into account
questions this interpretation and suggests that of the profound “asking” act. If
the semantic content lends itself to this the hypothesis is kept and the gesture
then has the same exact role as a question intonation outline. Or the semantic
contents do not fuse and in that case we are faced either with two distinct acts
or with a composite act which comprises a query with its semantic content
and an assertion with its own semantic content. We can then consider a
ranking operating between the two: the linguistic act takes precedence over
the gesture act if only because a dialogue is first and foremost linguistic. As in
the case of the example “how long with this itinerary which seems shorter?”,
the system will have to decide on its reaction while taking into account the
three possibilities that it is presented with: react to the first act (in this case
the linguistic assertion), react to the second act (in this case the querying
gesture) or react to both. This is part of the dialogue strategy and is the focus
of the next chapter.
Search WWH ::




Custom Search