Recognition of Meter Values - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

7.4.1 Network Architecture and Training

The architecture of the Neural Abstraction Pyramid network used for the recogni-

tion of entire meter values is sketched in Figure 7.10. It is a feed-forward network

consisting of five layers.

Layer 0 at the bottom of the hierarchy has a resolution of 32 × 16. It only contains

the input feature array. The resolution of the feature arrays decreases from layer to

layer by a factor of two in both dimensions, until Layer 3 reaches a size of only 4 × 2

hypercolumns. Similarly, the width of the border that is set to zero decreases from

16 to 2. At the same time, the number of excitatory features rises from 4 in Layer 1,

to 16 in Layer 2, and to 32 in Layer 3.

The network contains 20 output feature cells in the topmost layer which encode

the meter value. The output code used is composed of two sections that indicate the

identity of the two digits of interest in a 1-out-of-10 code.

The projections of output feature cells receive their inputs directly from all po-

sitions of all feature arrays of Layer 3. Their weights are allowed to change sign.

The potential of these projections is passed through a sigmoidal transfer function

f sig ( β = 1 , see Fig. 4.5(a) in Section 4.2.4), which saturates at zero and one.

In contrast, the cells of excitatory features located in Layer 1 to Layer 3 are

driven by specific excitation and unspecific inhibition. The weights of their specific

excitatory projections originate from overlapping 4 × 4 windows of the feature arrays

in the layer below them. Unspecific inhibitory projections have a single weight to

the smoothed and subsampled sum of these features. Both projections have linear

transfer functions. The transfer function f p sig ( β = 2 , see Fig. 4.6(b)), which is

used for the output units is a rectifying function that saturates at activities of one.

This ensures that the network learns sparse representations of the digit block since

the activity becomes exactly zero if inhibition exceeds excitation.

The feature sums and their subsampled versions, needed for the unspecific inhi-

bition, are computed as described in Section 5.2.1. The network is initialized using

the unsupervised learning of sparse features, described in Chapter 5. Supervised

training is done with gradient descent on the squared output error until the perfor-

mance on the test set does not improve any more.

The training enforces the desired signs of the weights. If a specific excitatory

weight becomes negative, it is set to zero, and the unspecific inhibitory weight is

changed instead. This leads to sparse excitatory weights since, after training, many

of them have a value of exactly zero and can be pruned away without loss.

7.4.2 Experimental Results

The trained Neural Abstraction Pyramid network is able to perform the recognition

task quite well. After deleting 21 examples from the training set that could not be

centered successfully or that were not readable for an experienced human observer,

there are only 11 substitutions left. All but one of them can be rejected easily.

The test set has not been modified. In Figure 7.11 some test examples are shown

that are difficult, but were recognized successfully, along with some examples for

Search WWH ::

Custom Search

Home