Learning Iterative Image Reconstruction - Hierarchical Neural Networks for Image Interpretation - page 182

Information Technology Reference

In-Depth Information

(a)

(b)

Fig. 9.11. Some examples from the MNIST dataset: (a) original images; (b) with occlusions

caused by a randomly placed 8 × 8 light gray square.

recurrent neural network using gradient descent to reconstruct the original. The net-

work had a local connection structure with many adaptable parameters since no

weight sharing was used. The hidden units develop receptive fields that form a to-

pographic feature map. The network is able to complete images of the single digit

class it was trained with. However, it remained open if the reconstruction is possible

when the digit class is unknown to the network.

In the following, I extend Seung's approach by adding lateral connections,

weight sharing, and more layers to the network and by training it to reconstruct

digits from all classes without presenting the class label.

9.3.1 MNIST Dataset

For the reconstruction experiments that follow, the MNIST database of handwritten

digits [132] is used. The NIST digits [80] have been scaled to 20 × 20 pixels and

were centered in a 28 × 28 image. Normalization to a fixed size facilitates recog-

nition since one source of variability is removed. Centering removes translational

variability as well. The lower resolution is still sufficient to recognize the digits. It

allows for the use of smaller networks that facilitate generalization and reduce com-

putational costs. Figure 9.11(a) shows some examples from the MNIST dataset.

Occlusion was simulated with an 8 × 8 square that is set to an intensity of 0 . 125

(light gray). The square is placed randomly at one of 12 × 12 central positions, leav-

ing a four pixel wide border that was never modified, as shown in Figure 9.11(b).

The square is placed only at inner positions to make sure that some parts of the digit

are occluded.

9.3.2 Architecture for Filling-In of Occlusions

The recurrent reconstruction network is an instance of the Neural Abstraction Pyra-

mid architecture. It consists of four layers, as illustrated in Figure 9.12. The leftmost

Layer 0 has a resolution of 28 × 28 hypercolumns. It contains the input feature array,

one hidden feature array, and the output feature array of the network.

Layer 1 contains four feature arrays of resolution 14 × 14. In Layer 2, the reso-

lution drops to 7 × 7, while the number of different features increases to eight. The

topmost Layer 3 consists of only a single hypercolumn with 16 feature cells.

Both hidden and output feature cells of Layer 0 receive input from 3 × 3 win-

dows of the input feature array. The three Layer 0 feature arrays are accessed by

Next Page

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home