Neural Abstraction Pyramid Architecture - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

inhibitory

excitatory

Output

Edges

Lines

Input

Layer 0 (240 x 96 x 2)

Layer 1 (120 x 48 x 4)

Layer 2 (60 x 24 x 8)

Fig. 4.12. ZIP code binarization - network architecture. The Neural Abstraction Pyra-

mid consists of three layers. The bottom layer represents the image in terms of fore-

ground/background features. The middle layer contains detectors for horizontal and vertical

step edges. In the top layer, the lines are represented by the activities of eight orientation

selective line features.

This behavior is not ideal for recognition. The structure of digits is altered consider-

ably by broken lines and additional foreground pixels may also mislead recognition

especially if they are close to the lines.

The reason for these binarization problems is the limited use of context infor-

mation in the thresholding method. Only global context via the intensity histogram

is used to determine the binarization threshold, but the local context of a pixel is

not considered for the binarization decision. In the following, a Neural Abstraction

Pyramid is described that makes this decision based on the local context. The idea

motivating the network's construction is to detect the lines and use them to bias bi-

narization. A pixel belonging to a line should be assigned to the foreground class,

even if it is not much darker than its neighborhood. On the other hand, dark pixels

should be assigned to the background if they are not supported by a line.

The network's architecture is sketched in Figure 4.12. It consists of three layers

that represent the image at three levels of abstraction:

• Layer 0 contains the input image, two excitatory feature arrays that represent the

foreground/background assignment, and one inhibitory feature array that contains

the sums of the foreground and the background features.

• Layer 1 contains four feature arrays that represent horizontal and vertical step

edges. One inhibitory feature contains the sum of the edges.

• Layer 2 contains eight excitatory feature arrays that represent lines in different

orientations. Two inhibitory feature arrays compute the sums of the more hori-

zontal and the more vertical lines, respectively. One inhibitory feature array sums

lines of all orientations.

Search WWH ::

Custom Search

Home