Information Technology Reference
In-Depth Information
6.1.2. Visual scene analysis
A first source to determine focus spaces or contextual subsets that match
reference domains is visual perception. We saw in section 2.1.1 that the MMD
system knows the nature and spatial location of all the objects displayed on
the visual scene so in our example, the train journeys which were highlighted
graphically, and, in the case of a task such as the one carried out by
SHRDLU, geometric shapes that create the micro-physical world of the task.
In this context, the user can focus on a subset, for example the shapes placed
on the left or all the hollow shapes. This visual subset is determined using
criteria such as those presented by the Gestalt theory: spatial proximity
between objects, similarity, continuity, etc., see a ranking formalization in
[LAN 04] and [LAN 06]. It only becomes a reference domain from the
moment when the user expresses a reference to a perceptive group (the group
of shapes on the left) or a spatially isolated element or through its intrinsic
properties such as size and color. This reference domain then allows the
system to chart attentional focalization phenomena by allowing, for example,
the interpretation of “the green block”, not as the only object in the visual
scene verifying the properties of being block shaped and green, but also the
only object in the visual reference domain with these properties. In the case
when the scene has another green block, which is not placed in the focus
space, this mechanism allows the system to avoid a reaction such as “I do not
understand which green block you are referring to”, but instead it has the
ability to model the attention and solve the references in a relevant manner.
An important phenomenon that can happen within this framework is that
of visual salience, which allows the system to chart the focus of the user's
attention on a specific object, when it is different from other visible objects
due to specific properties: in the foreground, bigger, of a different color. In
addition to the ability of automatically detecting perceptive groups, an MMD
system relying on a visual scene displayed on screen thus benefits from
automatically detecting visually salient objects, as we saw in section 2.1.1. At
the reference domain level, an object's salience does not contribute in
building a new potential domain, but in focusing one of the elements of a
reference domain built according to the perceptive groups. Thus, the
exophoric pronoun, for example “it” without any possible linguistic
antecedent, can be seen as referring to the most salient object in the common
reference domain. In this case, again, modeling multimodal reference
domains allows for the system's in-depth understanding.
Search WWH ::




Custom Search