Information Technology Reference
In-Depth Information
communication situations gives it even more weight in this case. As an
example, object references can take on a variety of linguistic forms, including
the various types of pronouns and noun phrases. In multimodal MMD, each
of these forms exists and can be matched with a pointing gesture. The variety
of pointing gestures thus has to be taken into account, as well as that of visual
contexts in which the gestures were generated (see Chapter 6). The
combination is thus greater and the coverage of reference situations should
follow this scale increase.
Segmented assessment considers the system to be a transparent box and
is interested in the internal functionalities and representations. The assessment
thenfocusesontheinputsandoutputsofeachmodule.Withinanoraldialogue,
we can often limit ourselves to the semantic module output and compare
the semantic representations obtained with reference representations. The
application of this method to multimodal systems generates a few issues, some
were already present in oral systems but are exacerbated and others are specific
to the introduction of multimodality. Thus, if we focus on the assessment of
multimodal understanding, i.e. on the fusion of information recorded at input
(knowing that the assessment of the multimodal generation will ask similar
questions):
- we can proceed as for the oral and focus on the multimodal semantic
representations obtained at the output of the module managing the global
semantic analysis, i.e. the module responsible for multimodal fusion.
Depending on the system (e.g. if the multimodality covers natural language
and conversational gestures or, on the contrary, brings together emotion
detection on the user's face with lip reading and natural language analysis),
these multimodal semantic representations can vary a lot and cover a wide
variety of phenomena. The main issue is then to determine the reference
multimodal representations that several systems will have in common.
Specifying exhaustive representations is almost impossible, especially since
the technology evolves and makes any specification quickly obsolete;
- we can, on the contrary, consider a multimodal system to be a process of
fusion as well as a set of monomodal systems, each characterized by a type of
semantic representation. The assessment then concerns, on the one hand, the
fusion process and, on the other hand, each monomodal system, with reference
representations each time. We will thus have to first specify the reference
semantic representations for gesture trajectories, emotion interpretation, etc.
Search WWH ::




Custom Search