Biomedical Engineering Reference
In-Depth Information
generated by the observers [62]. In a typical experiment, different observers exam-
ine a same photograph while their eye movements are being tracked, but are asked
to answer different questions about the scene (for example, estimate the age of the
people in the scene, or determine the country in which the photograph was taken).
Although all observers are presented with an identical visual stimulus, the patterns
of eye movements recorded differ dramatically depending on the question being ad-
dressed by each observer. These experiments clearly demonstrate that task demands
play a critical role in determining where attention is to be focused next.
Building in part on eye tracking experiments, Stark and colleagues [38] have pro-
posed the scanpath theory of attention, according to which eye movements are gen-
erated almost exclusively under top-down control. The theory proposes that what
we see is only remotely related to the patterns of activation of our retinas; rather, a
cognitive model of what we expect to see is at the basis of our percept. The sequence
of eye movements which we make to analyze a scene, then, is mostly controlled top-
down by our cognitive model and serves the goal of obtaining specific details about
the particular scene instance being observed, to embellish the more generic internal
model. This theory has had a number of successful applications to robotics control,
in which an internal model of a robot's working environment was used to restrict the
analysis of incoming video sequences to a small number of circumscribed regions
important for a given task.
19.6
Attention and scene understanding
We have seen how attention is deployed onto our visual environment through a co-
operation between bottom-up and top-down driving influences. One difficulty which
then arises is the generation of proper top-down biasing signals when exploring a
novel scene; indeed, if the scene has not been analyzed and understood yet using
thorough attentional scanning, how can it be used to direct attention top-down? Be-
low we explore two dimensions of this problem: First, we show how already from
a very brief presentation of a scene we are able to extract its gist, basic layout, and
a number of other characteristics. This suggests that another part of our visual sys-
tem, which operates much faster than attention, might be responsible for this coarse
analysis; the results of this analysis may then be used to guide attention top-down.
Second, we explore how several computer vision models have used a collaboration
between the where and what subsystems to yield sophisticated scene recognition al-
gorithms. Finally, we cast these results into a more global view of our visual system
and the function of attention in vision.
Search WWH ::




Custom Search