Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

Hierarchical Kalman Filters. If one does not use binary stochastic processing

units, but the generation model is a weighted sum of basis functions with added

Gaussian noise, inference is tractable as well. The Kalman filter [116] allows to in-

fer the hidden causes from data, even if the causes change in time according to a

linear dynamical system. Rao [186] proposed using Kalman filters to learn image

models. Segmentation and recognition of objects and image sequences was demon-

strated in the presence of occlusions and clutter.

To account for extra-classical receptive-field effects in the early visual system,

Rao and Ballard [187] combined several simplified Kalman filters in a hierarchical

fashion. In this model, static images I are represented in terms of potential causes

r : I = U r + n , where n is zero mean Gaussian noise with variance σ 2 . The matrix

U contains the basis vectors U j that mediate between the causes and the image. To

make the model hierarchical, the causes r are represented in terms of higher-level

causes r h : r = r td + n td , where r td = U h r h is a top-down prediction of r and n td

is zero mean Gaussian noise with variance σ td .

The goal is now to estimate, for each hierarchical level, the coefficients r for a

given image and, on a longer time scale, learn appropriate basis vectors U j . This is

achieved by minimizing:

σ 2 ( I

σ td

−

U r ) T ( I

−

U r ) +

( r − r td ) T ( r − r td ) + g ( r ) + h ( U ) ,

E =

i,j U i,j are the negative logarithms of

the Gaussian prior probabilities of r and U , respectively. The two first terms of E

describe the negative logarithms of the probability of the data, given the parameters.

They are the squared prediction errors for Level 1 and Level 2, weighted with the

inverse variances.

An optimal estimate of r can be obtained by gradient descent on E with respect

to r :

r i and h ( U ) = λ

where g ( r ) = α

d r

dt = − k 1

∂E

∂ r

= k 1

k 1

σ td

σ 2 U T ( I

U r ) +

( r td − r ) − k 1 α r ,

−

where k 1 is a positive constant. This computation is done in the predictive estimator

(PE) module, sketched in Figure 3.11(a). It combines the bottom-up residual error

( I

U r ) that has been passed through U T with the top-down error ( r td − r ) to

improve r . Note that all the information required is available locally at each level.

A synaptic learning rule for adapting the weights U can be obtained by perform-

ing gradient descent on E with respect to U after the estimate r becomes stable:

d U

−

= − k 2

∂E

∂ U

= k 2

σ 2 ( I

U r ) r T − k 2 λ U ,

−

where k 2 is the learning rate. This is a Hebbian [91] type of learning with weight

decay.

Rao and Ballard applied this optimization to the three-layered network sketched

in Figure 3.11(b). In Level 0, three 16 × 16 image patches enter the network which

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home