Information Technology Reference
In-Depth Information
- Q = “follow the street on the right?” (question addressed to the system
after the utterance D and meant to assess the proper understanding of D);
- R = “yes” (system's answer showing that the anaphora was correctly
understood and making the assessment positive).
The authors specified seven levels characterizing the scope of the questions
asked. Let us take these levels up here by indicating each time how to extend
the paradigm to be able to use it in a multimodal dialogue.
- Level1=explicit information. It is the marking of an explicit piece
of information in the utterance, the point is to test the good understanding
of the literal utterance given the great variability of spontaneous language.
The examples given by the authors limit themselves to taking up part of the
utterance and asking the user to confirm this part has been understood: D
= “you will take a right after the white buildings with blue shutters” then
Q =“white shutters?” or “blue shutters?” The extension of this principle to
multimodality consists of asking questions on elements on the multimodal
utterance. With D = “put that there” + gesture in (x 1 , y 1 ) + gesture in (x 2 , y 2 ),
we can test the tracking abilities of the multimodality by asking the following
questions Q: “that?” + gesture in (x 1 , y 1 ); “put there?” + gesture in (x 2 , y 2 );
“put that?” + gesture in (x 2 , y 2 ); “put that there?” + gesture in (x 2 , y 2 )+
gesture in (x 1 , y 1 ), etc. The process can appear naive but it allows the system
to simply test the proper matching of gestures with referring expression, which
is an important part of the multimodal fusion. Specific attention is given to
temporal synchronization between the words pronounced and the gestures
generated. Thus, a temporal delay between “that” and the gesture in question Q
could lead, depending on the system, to a positive answer, which would reflect
its robustness in multimodal matching even when the production conditions
deviate, or, on the contrary, to a negative answer reflecting the system's
inability to go beyond a certain temporal gap.
- Level2=implicit information. This level concerns the resolution of
anaphora, ellipses, gaps and other implicit information that can be salvaged
at a syntactic and semantic level. An example would involve: D = “give me
a ticket for Paris and one for Lyon too” and Q = “ticket for Lyon?” The
reference resolution is one of the main aspects of spontaneous multimodality,
so a multimodal DQR will have to obviously account for it. Thus, if we take
up D to be the multimodality universal primitive, “put that there” with two
pointing gestures, the questions Q could introduce precisions on the referents,
starting, for example, from the mention of their category and going so far as to
Search WWH ::




Custom Search