Information Technology Reference
In-Depth Information
where η f is a constant.
To achieve a complete representation, the features are forced to react to all sig-
nificant input stimuli by constraining the smoothed sum S l of the features in layer l
to be equal to the subsampled sum
S ( l− 1) of the input features from layer ( l 1) :
η c
K l
a ijkl ( S ij ( l− 1) S ijl ) ,
Ec
=
(5.4)
kl
Ic
= Ec
kl ,
kl
where η c is a constant. If the activity of the features is too low, the excitatory gains
of the active features are increased, and they are disinhibited at the same time. The
opposite behavior applies when the features are too active.
To enforce sparseness, the activity of a winning feature must be made large, e.g.
to V = 0 . 75 :
η s ( V− a ijk max l )
: k = k max
[winning]
Es
kl =
(5.5)
[not winning] ,
0
: k 6 = k max
where η s is a constant. If the activity of the winner is too small, its excitatory gain
is increased; otherwise, it is decreased.
If adding
If
kl
kl to I kl makes the inhibitory gain negative, its weight is
added to E kl , and I kl is set to zero. Vice versa, if E kl should become negative from
adding Ec
and Ic
kl it is set to zero, and its weight is added to I kl .
The efficacy of the constraint enforcing rules, described above, can be controlled
by the learning constants. One possible choice could be: η f = η c = η s = 0 . 1 η l . The
rules are designed such that their net effect goes to zero if the learned representation
has the desired properties. Then the templates describing the computation of the
features become stable, and the training can be stopped.
The number of training images needed to determine the weights of the weight
templates for a layer increases with the height of that layer since the number of
examples per image decreases and the number of weights per layer increases.
Because the emerging representations are sparse, most of the weights will be
close to zero after training and can be pruned away without significant loss. This
speeds up the computation and saves memory.
kl and Es
5.3 Learning Hierarchical Digit Features
The properties of the described unsupervised learning algorithm can be illustrated
by applying it to a dataset of handwritten digits. Here, digits are used which have
been extracted by Siemens AG from German ZIP codes written on large-size letters.
The available examples are partitioned as follows: 44,619 digits constitute the
training set (TRN), 5,379 digits are available for testing the performance of a recog-
nition system and to stop training (TST), and 6,313 digits are used for final valida-
tion (VAL).
Search WWH ::




Custom Search