Information Technology Reference
In-Depth Information
6.1.3. Pointing gesture analysis
In the case of a multimodal interaction with a touch screen, the user can
make gestures, for example point or circle, to refer to the objects displayed. If
the gesture trajectory perfectly matches the targeted objects, the MMD system
can solve the reference without too much difficulty. If the trajectory is
approximate, for example misses or involuntarily includes an object that is
not part of the intentional reference, then the system is faced with cases of
undecidedness. This is where the notion of reference domain can provide
useful clarifications.
In general, processing a gesture can lead to the detection of an ambiguity
on the intent behind the gesture: the same gesture with the same shape and
same trajectory can come from various intentions. A hand movement tracked
by a camera can, for example, match a paraverbal gesture that emphasizes a
specific word but is not referring or can point at a specific object and thus
carry a reference. The presence of a referring expression in the linguistic
utterance as well as machine learning techniques applied to recognizing
paraverbal gestures can obviously help solve this type of ambiguity. Once the
system is sure that the gesture is a deictic gesture, the analysis of the gesture
trajectory can in itself lead to detecting ambiguity. In the case of an
interaction with a touch screen, one example is a gesture surrounding three
objects but that also partially includes a fourth and ends very close to a fifth.
Based on a structural analysis of the circling shape, i.e. a detection of the
remarkable aspects of the trajectory such as points of inflection, crossings,
constant curve areas or closing areas [BEL 96], of an analysis of the visual
scene in terms of perceptive groups (section 6.1.2), and potentially geometric
index calculations such as covering level or relative distances, the system can
then discard the fourth object, for example, because it is not part of the same
perceptive group as the other objects in question, and since the gesture
trajectory shows a slight avoidance movement when it covers the fourth
object. However, it can decide to keep the fifth object as a potential candidate
inasmuch as the fifth object is part of the same perceptive group as the three
objects that are clearly circled. If the dialogue history has a reference domain
that separates this object from the three which are circled, the decision made
would have been the opposite. In any case, we now have two hypotheses: one
about the three objects and one about the fourth object. This is a pre-analysis
of the gesture in visual context, and this pre-analysis will be confronted to the
semantic analysis of the simultaneous oral utterance: thus, either a referring
Search WWH ::




Custom Search