Information Technology Reference
In-Depth Information
(a)
(b)
Fig. 9.11. Some examples from the MNIST dataset: (a) original images; (b) with occlusions
caused by a randomly placed 8 × 8 light gray square.
recurrent neural network using gradient descent to reconstruct the original. The net-
work had a local connection structure with many adaptable parameters since no
weight sharing was used. The hidden units develop receptive fields that form a to-
pographic feature map. The network is able to complete images of the single digit
class it was trained with. However, it remained open if the reconstruction is possible
when the digit class is unknown to the network.
In the following, I extend Seung's approach by adding lateral connections,
weight sharing, and more layers to the network and by training it to reconstruct
digits from all classes without presenting the class label.
9.3.1 MNIST Dataset
For the reconstruction experiments that follow, the MNIST database of handwritten
digits [132] is used. The NIST digits [80] have been scaled to 20 × 20 pixels and
were centered in a 28 × 28 image. Normalization to a fixed size facilitates recog-
nition since one source of variability is removed. Centering removes translational
variability as well. The lower resolution is still sufficient to recognize the digits. It
allows for the use of smaller networks that facilitate generalization and reduce com-
putational costs. Figure 9.11(a) shows some examples from the MNIST dataset.
Occlusion was simulated with an 8 × 8 square that is set to an intensity of 0 . 125
(light gray). The square is placed randomly at one of 12 × 12 central positions, leav-
ing a four pixel wide border that was never modified, as shown in Figure 9.11(b).
The square is placed only at inner positions to make sure that some parts of the digit
are occluded.
9.3.2 Architecture for Filling-In of Occlusions
The recurrent reconstruction network is an instance of the Neural Abstraction Pyra-
mid architecture. It consists of four layers, as illustrated in Figure 9.12. The leftmost
Layer 0 has a resolution of 28 × 28 hypercolumns. It contains the input feature array,
one hidden feature array, and the output feature array of the network.
Layer 1 contains four feature arrays of resolution 14 × 14. In Layer 2, the reso-
lution drops to 7 × 7, while the number of different features increases to eight. The
topmost Layer 3 consists of only a single hypercolumn with 16 feature cells.
Both hidden and output feature cells of Layer 0 receive input from 3 × 3 win-
dows of the input feature array. The three Layer 0 feature arrays are accessed by
Search WWH ::




Custom Search