Information Technology Reference
In-Depth Information
7.4.1 Network Architecture and Training
The architecture of the Neural Abstraction Pyramid network used for the recogni-
tion of entire meter values is sketched in Figure 7.10. It is a feed-forward network
consisting of five layers.
Layer 0 at the bottom of the hierarchy has a resolution of 32 × 16. It only contains
the input feature array. The resolution of the feature arrays decreases from layer to
layer by a factor of two in both dimensions, until Layer 3 reaches a size of only 4 × 2
hypercolumns. Similarly, the width of the border that is set to zero decreases from
16 to 2. At the same time, the number of excitatory features rises from 4 in Layer 1,
to 16 in Layer 2, and to 32 in Layer 3.
The network contains 20 output feature cells in the topmost layer which encode
the meter value. The output code used is composed of two sections that indicate the
identity of the two digits of interest in a 1-out-of-10 code.
The projections of output feature cells receive their inputs directly from all po-
sitions of all feature arrays of Layer 3. Their weights are allowed to change sign.
The potential of these projections is passed through a sigmoidal transfer function
f sig ( β = 1 , see Fig. 4.5(a) in Section 4.2.4), which saturates at zero and one.
In contrast, the cells of excitatory features located in Layer 1 to Layer 3 are
driven by specific excitation and unspecific inhibition. The weights of their specific
excitatory projections originate from overlapping 4 × 4 windows of the feature arrays
in the layer below them. Unspecific inhibitory projections have a single weight to
the smoothed and subsampled sum of these features. Both projections have linear
transfer functions. The transfer function f p sig ( β = 2 , see Fig. 4.6(b)), which is
used for the output units is a rectifying function that saturates at activities of one.
This ensures that the network learns sparse representations of the digit block since
the activity becomes exactly zero if inhibition exceeds excitation.
The feature sums and their subsampled versions, needed for the unspecific inhi-
bition, are computed as described in Section 5.2.1. The network is initialized using
the unsupervised learning of sparse features, described in Chapter 5. Supervised
training is done with gradient descent on the squared output error until the perfor-
mance on the test set does not improve any more.
The training enforces the desired signs of the weights. If a specific excitatory
weight becomes negative, it is set to zero, and the unspecific inhibitory weight is
changed instead. This leads to sparse excitatory weights since, after training, many
of them have a value of exactly zero and can be pruned away without loss.
7.4.2 Experimental Results
The trained Neural Abstraction Pyramid network is able to perform the recognition
task quite well. After deleting 21 examples from the training set that could not be
centered successfully or that were not readable for an experienced human observer,
there are only 11 substitutions left. All but one of them can be rejected easily.
The test set has not been modified. In Figure 7.11 some test examples are shown
that are difficult, but were recognized successfully, along with some examples for
Search WWH ::




Custom Search