Unsupervised Learning - Hierarchical Neural Networks for Image Interpretation - page 107

Information Technology Reference

In-Depth Information

Table 5.1. Learning a hierarchy of sparse features - emerging representations.

layer

name

feature arrays

hypercolumns

feature cells

input size

5

digits

128

1 × 1

128

32 × 32

4

curves

64

2 × 2

256

16 × 16

3

strokes

32

4 × 4

512

8 × 8

2

lines

16

8 × 8

1024

4 × 4

1

edges

8

16 × 16

2048

2 × 2

0

contrasts

4

32 × 32

4096

1 × 1

Since the digits show a high degree of variance, some preprocessing steps are

necessary prior to presentation to the pyramid. Preprocessing consists of binariza-

tion, size and slant normalization. The images are scaled to 24 × 24 pixels and are

centered into the 32 × 32 input array at the bottom layer of the pyramid.

The Neural Abstraction Pyramid is initialized at the lowest level ( l = 0) with

contrast detectors. These have a center-surround type receptive field that analyzes

the intensities of the input image. Four different features are used: center-on/off-

surround and center-off/on-surround in two scales, representing the fine and coarse

details of the foreground and the background, respectively. The feature arrays are

surrounded by a border of the same width that is set to zero.

Repeated application of the unsupervised learning method, described above,

yields following representations (compare to Table 5.1):

- Edges: Vertical, horizontal, and diagonal step edges are detected at Layer 1.

- Lines: At Layer 2 short line segments with 16 different orientations are detected.

- Strokes: Larger line segments that have a specific orientation and a specific cur-

vature are detected at Layer 3. Detectors for line endings and specific parallel

lines emerge as well.

- Curves: The feature detectors at Layer 4 react to typical large substructures of

digits, such as curves, crossings, junctions, etc.

- Digits: The feature cells at the topmost Layer 5 see the entire digit. Consequently,

detectors for typical digit shapes emerge.

Figure 5.2 shows in its upper right part a preprocessed input digit. On the upper

left, the activities of the contrast detectors are shown. They provide input to the

edge features via the specific weights of the excitatory projections. On the left side

of the figure, the activity of the edge feature arrays is shown. It can be seen that

the feature cells detect oriented step edges. For instance, the feature in the first row

detects edges on the lower side of horizontal lines. It receives input from foreground

features in the upper part of its projection and from background features in the lower

part of the projection. The right side of the figure shows the four best stimuli of the

training set that excited the features maximally. In the center of these stimuli, the

2 × 2 area of responsibility of Layer 1 features is shown in the original contrast. Its

neighborhood is shown with a lower contrast.

Next Page

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home