Biomedical Engineering Reference
In-Depth Information
bottom-up attentional deployment. This theory proposed that only fairly simple vi-
sual features are computed in a massively parallel manner over the entire incoming
visual scene, in early visual processing areas including primary visual cortex. Atten-
tion is then necessary to bind those early features into a more sophisticated object
representation, and the selected bound representation is (to a first approximation)
the only part of the visual world which passes though the attentional bottleneck for
further processing.
The first explicit neurally-plausible computational architecture of a system for the
bottom-up guidance of attention was proposed by Koch and Ullman [27], and is
closely related to the feature integration theory. Their model is centered around a
saliency map, that is, an explicit two-dimensional topographic map that encodes for
stimulus conspicuity, or salience, at every location in the visual scene. The saliency
map receives inputs from early visual processing, and provides an efficient control
strategy by which the focus of attention simply scans the saliency map in order of
decreasing saliency.
This general architecture has been further developed and implemented, yielding
the computational model depicted in Figure 19.3 [23]. In this model, the early stages
of visual processing decompose the incoming visual input through an ensemble of
feature-selective filtering processes endowed with contextual modulatory effects. In
order to control a single attentional focus based on this multiplicity in the representa-
tion of the incoming sensory signals, it is assumed that all feature maps provide input
to the saliency map, which topographically represents visual salience, irrespectively
of the feature dimension by which a given location was salient. Biasing attention to
focus onto the most salient location is then reduced to drawing attention towards the
locus of highest activity in the saliency map. This is achieved using a winner-take-
all neural network, which implements a neurally distributed maximum detector. In
order to prevent attention from permanently focusing onto the most active (winner)
location in the saliency map, the currently attended location is transiently inhibited in
the saliency map by an inhibition-of-return mechanism. After the most salient loca-
tion is thus suppressed, the winner-take-all network naturally converges towards the
next most salient location, and repeating this process generates attentional scanpaths
[23, 27].
Many successful models for the bottom-up control of attention are architectured
around a saliency map. What differentiates the models, then, is the strategy em-
ployed to prune the incoming sensory input and extract salience. In an influential
model mostly aimed at explaining visual search experiments, Wolfe [59] hypoth-
esized that the selection of relevant features for a given search task could be per-
formed top-down, through spatially-defined and feature-dependent weighting of the
various feature maps. Although limited to cases where attributes of the target are
known in advance, this view has recently received experimental support from studies
of top-down attentional modulation (see below).
Tsotsos and colleagues [56] implemented attentional selection using a combina-
tion of a feedforward bottom-up feature extraction hierarchy and a feedback selective
tuning of these feature extraction mechanisms. In this model, the target of attention
is selected at the top level of the processing hierarchy (the equivalent of a saliency
Search WWH ::




Custom Search