Information Technology Reference
In-Depth Information
Table 5.1. Learning a hierarchy of sparse features - emerging representations.
layer
name
feature arrays
hypercolumns
feature cells
input size
5
digits
128
1 × 1
128
32 × 32
4
curves
64
2 × 2
256
16 × 16
3
strokes
32
4 × 4
512
8 × 8
2
lines
16
8 × 8
1024
4 × 4
1
edges
8
16 × 16
2048
2 × 2
0
contrasts
4
32 × 32
4096
1 × 1
Since the digits show a high degree of variance, some preprocessing steps are
necessary prior to presentation to the pyramid. Preprocessing consists of binariza-
tion, size and slant normalization. The images are scaled to 24 × 24 pixels and are
centered into the 32 × 32 input array at the bottom layer of the pyramid.
The Neural Abstraction Pyramid is initialized at the lowest level ( l = 0) with
contrast detectors. These have a center-surround type receptive field that analyzes
the intensities of the input image. Four different features are used: center-on/off-
surround and center-off/on-surround in two scales, representing the fine and coarse
details of the foreground and the background, respectively. The feature arrays are
surrounded by a border of the same width that is set to zero.
Repeated application of the unsupervised learning method, described above,
yields following representations (compare to Table 5.1):
- Edges: Vertical, horizontal, and diagonal step edges are detected at Layer 1.
- Lines: At Layer 2 short line segments with 16 different orientations are detected.
- Strokes: Larger line segments that have a specific orientation and a specific cur-
vature are detected at Layer 3. Detectors for line endings and specific parallel
lines emerge as well.
- Curves: The feature detectors at Layer 4 react to typical large substructures of
digits, such as curves, crossings, junctions, etc.
- Digits: The feature cells at the topmost Layer 5 see the entire digit. Consequently,
detectors for typical digit shapes emerge.
Figure 5.2 shows in its upper right part a preprocessed input digit. On the upper
left, the activities of the contrast detectors are shown. They provide input to the
edge features via the specific weights of the excitatory projections. On the left side
of the figure, the activity of the edge feature arrays is shown. It can be seen that
the feature cells detect oriented step edges. For instance, the feature in the first row
detects edges on the lower side of horizontal lines. It receives input from foreground
features in the upper part of its projection and from background features in the lower
part of the projection. The right side of the figure shows the four best stimuli of the
training set that excited the features maximally. In the center of these stimuli, the
2 × 2 area of responsibility of Layer 1 features is shown in the original contrast. Its
neighborhood is shown with a lower contrast.
Search WWH ::




Custom Search