Learning Iterative Image Reconstruction - Hierarchical Neural Networks for Image Interpretation - page 176

Information Technology Reference

In-Depth Information

(a)

(b)

(c)

Fig. 9.2. Some digits from the NIST dataset. Shown are: (a) centered images in the original

resolution (64 × 64); (b) subsampled to 16 × 16 pixels (pixelized); (c) bicubic interpolation to

original resolution (blurred).

9.2.1 NIST Digits Dataset

The first reconstruction experiment is done using the original NIST images of seg-

mented binarized handwritten digits [80]. They have been extracted by NIST from

hand printed sample forms. The digits are contained in a 128 × 128 window, but

their bounding box is typically much smaller. For this reason, the bounding box was

centered in a 64 × 64 window to produce the desired output Y . Figure 9.2(a) shows

some centered sample images from the NIST dataset. The input X to the network

consists of subsampled versions of the digits with resolution 16 × 16, shown for the

examples in Fig. 9.2(b), which were produced by averaging 4 × 4 pixels. Part (c)

of the figure demonstrates that bicubic interpolation is not an adequate method to

increase the resolution of the NIST digits since it produces blurred images.

9.2.2 Architecture for Super-Resolution

The network used for the super-resolution task is a very small instance of the Neural

Abstraction Pyramid architecture. Besides the input and the output feature arrays,

determined by the task, it has additional features only in the hidden layer. Such a

small network was chosen because it proved to be sufficient for the task.

The architecture of the network is illustrated in Figure 9.3. It consists of three

layers. The rightmost Layer 2 contains only a single feature array of resolution

16 × 16. The activities of its cells are set to the low resolution input image.

Layer 1 has resolution 32 × 32. It contains four feature arrays that produce a hid-

den representation of the digit. The leftmost Layer 0 contains only a single feature

array that is used as network output. It has the resolution 64 × 64.

The feature cells of the output feature have lateral and backward projections. The

weight matrix of the lateral projections has a size of 3 × 3. The 2 × 2 different back-

ward projections each access a single feature cell of each feature array in Layer 1.

This corresponds to the inverse of non-overlapping 2 × 2 forward projections for the

four Layer 1 features.

Feature cells in Layer 1 have all three types of projections. Forward projections

access 2 × 2 windows of the output feature array in Layer 0. Lateral projections

Next Page

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home