Unsupervised Learning - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

where η f is a constant.

To achieve a complete representation, the features are forced to react to all sig-

nificant input stimuli by constraining the smoothed sum S l of the features in layer l

to be equal to the subsampled sum

S ( l− 1) of the input features from layer ( l − 1) :

η c

K l

a ijkl ( S ij ( l− 1) − S ijl ) ,

∆ Ec

(5.4)

∆ Ic

= − ∆ Ec

kl ,

where η c is a constant. If the activity of the features is too low, the excitatory gains

of the active features are increased, and they are disinhibited at the same time. The

opposite behavior applies when the features are too active.

To enforce sparseness, the activity of a winning feature must be made large, e.g.

to V = 0 . 75 :

η s ( V− a ijk max l )

: k = k max

[winning]

∆ Es

kl =

(5.5)

[not winning] ,

: k 6 = k max

where η s is a constant. If the activity of the winner is too small, its excitatory gain

is increased; otherwise, it is decreased.

If adding ∆

kl to I kl makes the inhibitory gain negative, its weight is

added to E kl , and I kl is set to zero. Vice versa, if E kl should become negative from

adding ∆ Ec

and ∆ Ic

kl it is set to zero, and its weight is added to I kl .

The efficacy of the constraint enforcing rules, described above, can be controlled

by the learning constants. One possible choice could be: η f = η c = η s = 0 . 1 η l . The

rules are designed such that their net effect goes to zero if the learned representation

has the desired properties. Then the templates describing the computation of the

features become stable, and the training can be stopped.

The number of training images needed to determine the weights of the weight

templates for a layer increases with the height of that layer since the number of

examples per image decreases and the number of weights per layer increases.

Because the emerging representations are sparse, most of the weights will be

close to zero after training and can be pruned away without significant loss. This

speeds up the computation and saves memory.

kl and ∆ Es

5.3 Learning Hierarchical Digit Features

The properties of the described unsupervised learning algorithm can be illustrated

by applying it to a dataset of handwritten digits. Here, digits are used which have

been extracted by Siemens AG from German ZIP codes written on large-size letters.

The available examples are partitioned as follows: 44,619 digits constitute the

training set (TRN), 5,379 digits are available for testing the performance of a recog-

nition system and to stop training (TST), and 6,313 digits are used for final valida-

tion (VAL).

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home