Reference Resolution - Man-Machine Dialogue: Design and Challenges

Information Technology Reference

In-Depth Information

ambiguity. With our example U2 involving the referring expression “this

itinerary” and a pointing gesture toward one of the itineraries displayed on the

screen, we are in the simplest configuration possible: the underspecified

reference domain imposes an existing focalization in a domain that covers

the different itineraries to get to Paris, the gesture provides a hypothesis on

the itinerary pointed at, and the matching leads to the consideration that the

focalization is on this hypothesis, and thus to a reference resolution. In other

more complex cases, we might need to determine the type of access to the

referents with a fine analysis of the access type and determiner type

combinations [LAN 06]. In any case, a formalism like feature structure allows

the system to implement such a model, and the matching is carried out by a

unification operation. The challenge for MMD is thus mostly to determine all

the types of reference, to write a module in charge of deducing from the

linguistic forms the formalized constraints in the reference domains,

constraints which will direct the feature structure unification.

Moreover, the utterances that contain more than one reference can cause

problems linked to multimodal fusion. In an example, such as “is this

itinerary longer than these and these?”, three referring expressions can be the

focus of a pointing gesture, or even several pointing gestures for “these”. If

the system receives five pointing gestures, an in-depth analysis of the

temporal synchronization and matching possibilities between gestures and

expressions is necessary to determine which gestures are linked to which

expressions. The only constraint due to the natural use of language and

gesture is that the successive order of the gestures follows the successive

order of the expressions. In the extreme cases observed for tasks leading to a

number of references [LAN 04, p. 45], the combination can become so

complex that heuristics are required. These phenomena lead us particularly to

distinguish between various levels of multimodal fusion. Where many

approaches focalized on signals carry out a matching of gestures and

expressions only based on the temporal synchronization setting, i.e. by

running a physical multimodal fusion, other approaches such as those of

reference domains highlight another level of multimodal fusion: the semantic

level [MAR 06, LÓP 05]. Chapter 7 will present a third, pragmatic, level

related to dialogue acts.

Reference can also be applied to entities other than concrete objects such

as pyramids and itineraries. In the classic example “put that there” [BOL 80],

there is a first multimodal reference that does concern a concrete object,

Search WWH ::

Custom Search

Home