Information Technology Reference
In-Depth Information
ambiguity. With our example U2 involving the referring expression “this
itinerary” and a pointing gesture toward one of the itineraries displayed on the
screen, we are in the simplest configuration possible: the underspecified
reference domain imposes an existing focalization in a domain that covers
the different itineraries to get to Paris, the gesture provides a hypothesis on
the itinerary pointed at, and the matching leads to the consideration that the
focalization is on this hypothesis, and thus to a reference resolution. In other
more complex cases, we might need to determine the type of access to the
referents with a fine analysis of the access type and determiner type
combinations [LAN 06]. In any case, a formalism like feature structure allows
the system to implement such a model, and the matching is carried out by a
unification operation. The challenge for MMD is thus mostly to determine all
the types of reference, to write a module in charge of deducing from the
linguistic forms the formalized constraints in the reference domains,
constraints which will direct the feature structure unification.
Moreover, the utterances that contain more than one reference can cause
problems linked to multimodal fusion. In an example, such as “is this
itinerary longer than these and these?”, three referring expressions can be the
focus of a pointing gesture, or even several pointing gestures for “these”. If
the system receives five pointing gestures, an in-depth analysis of the
temporal synchronization and matching possibilities between gestures and
expressions is necessary to determine which gestures are linked to which
expressions. The only constraint due to the natural use of language and
gesture is that the successive order of the gestures follows the successive
order of the expressions. In the extreme cases observed for tasks leading to a
number of references [LAN 04, p. 45], the combination can become so
complex that heuristics are required. These phenomena lead us particularly to
distinguish between various levels of multimodal fusion. Where many
approaches focalized on signals carry out a matching of gestures and
expressions only based on the temporal synchronization setting, i.e. by
running a physical multimodal fusion, other approaches such as those of
reference domains highlight another level of multimodal fusion: the semantic
level [MAR 06, LÓP 05]. Chapter 7 will present a third, pragmatic, level
related to dialogue acts.
Reference can also be applied to entities other than concrete objects such
as pyramids and itineraries. In the classic example “put that there” [BOL 80],
there is a first multimodal reference that does concern a concrete object,
Search WWH ::




Custom Search