Mechanization of Cognition - Biomimetics: Biologically Inspired Technologies

Biomedical Engineering Reference

In-Depth Information

the squares of the sine and cosine inner products of the logons of the same scale and rotational

orientation in each jet (which reduces the total dimensionality of V to half that of the total

number of logons). (Note: Other mathematical transformations are then applied to each

of these sums to make their values insensitive to lighting gradient slopes and other lighting-

dependent effects — but these details go beyond the scope of this sketch and so are left out —

see Hecht-Nielsen and Zhou, 1995 for examples of such transformations.)

Each component of V essentially represents an estimate of the localized spatial frequency

content of the camera image (at the position of the associated gridpoint) at the spatial frequency

of the involved logon pair, in the direction of oscillation of that pair. It is on the basis of local spatial

frequency structure (which V accurately defines) that fixation points are chosen by the gaze

controller.

The job of the gaze controller is to learn to mimic the performance of a skilled human observer

performing the visual task that is to be mechanized. The manner in which the gaze controller works

and the method used to train it are now described.

The gaze controller (a perceptron; Hecht-Nielsen, 2004) has 224 inputs and two outputs. The

inputs represent the components of V corresponding to the jet at a particular image gridpoint (the

current position of regard of the gaze controller). The outputs of the gaze controller are estimates of

the a posteriori probability of this gridpoint being chosen by the skilled human as a fixation point

along with the a posteriori probability of this gridpoint not being chosen by the skilled human as a

fixation point. Training of the gaze controller is discussed below; but, to set the stage, the manner in

which the gaze controller is used operationally is described first.

Once trained, the gaze controller is used to select a fixation point in a newly acquired video

frame by evaluating each of the V component sets from each of the 263,169 gridpoints of the frame.

If the first output of the controller is above a fixed threshold (say, 0.8), and the second output is

below a fixed threshold (say, 0.2), then that gridpoint is selected as a candidate fixation point .If

there are no candidate fixation points for the frame, then that frame is skipped. If there are one or

more, the one with the highest first output value is selected as the fixation point. The gaze controller

also has provisions for creating multiple successive ''looks'' at the same object during visual

training to facilitate learning of pose insensitivity (see below). In operational use, when a visual

object of interest has been fixated on and described, the gaze controller tracks that object's fixation

points and prevents return to it until the other visual objects of interest in the scene have been

described.

To train the gaze controller, each fixation point example (for which a reference frame is selected

as the definitive ''image input'' that the human used — by taking a frame a fixed time increment

right before the beginning of their saccade) has its pixel coordinates (supplied by the frequently-

recalibrated eye tracker) stored with its reference frame. Eventually, many thousands of such

fixation point and reference frame pairs are produced, randomly scrambled to remove possible

content correlations between them, and stored. The V vector for each reference frame is also

calculated and stored with it.

The gaze controller perceptron is trained by marching through the fixation point or reference

frame examples, in sequence, many times. At each training episode, the next fixation point and

reference frame example in sequence is selected and the gridpoint nearest to the fixation point is

located. The jet components of the reference frame V vector for that gridpoint are then extracted

and provided to the perceptron, along with desired outputs 1 and 0, and one backpropagation

training episode using these specified inputs and outputs is carried out. Another gridpoint, distant

from the fixation point, is then selected and its jet V components are provided to the perceptron,

along with desired outputs 0 and 1, and a second perceptron training episode is carried out using

these inputs and outputs. The training process then moves on to the next fixation point or reference

image example. Thus, this training procedure beneficially utilizes oversampling of the examples of

the class of human-supplied fixation points (Hecht-Nielsen, 2004).

Biomimetics: Biologically Inspired Technologies

Search WWH ::

Custom Search

Home