Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

cently for categorization tasks, such as the distinction of images showing dogs and

cats. Riesenhuber and Poggio argue that in such an architecture the binding prob-

lem might not be as severe as originally perceived [192]. Since the lower levels

of the hierarchy contain retinotopic representations, features of spatially separated

objects do not interact and hence are bound by spatial proximity. Features in the

higher levels are complex combinations of simple features. Since there are many

such combinations, it is unlikely that the features of two objects can be combined to

a valid third object. However, the experiments showed that recognition performance

decreased slightly when two non-overlapping objects were present, but recognition

was impaired severely if the two objects overlapped.

The HMAX architecture was designed to recognize a single object in a feed-

forward manner. The use of the maximum operation for pooling makes the cell re-

sponses invariant to input transformations and also suppresses noise. The response

of a C-cell that reacts to a feature is not changed by nearby clutter, as long as the

strongest S-cell response to the feature is stronger than the S-responses to the dis-

tractor. However, a C-cell cannot tell the difference between one or more instances

of the same feature within its receptive field.

Convolutional Networks. The creation of features by enumeration of all possi-

ble subfeature-combinations is easy, but computationally inefficient. For practical

applications, such as optical character recognition (OCR) and the interpretation of

handwritten text, the network size plays an important role since real-time conditions

must be met for the network recall.

If more of the network parameters can be adapted to a specific task, smaller net-

works suffice to extract the relevant features. One example of a fully adaptable hier-

archical neural network is the convolutional network proposed by LeCun et al. [133]

for the recognition of isolated normalized digits. A recent version of such a network,

which is called LeNet-5 [134], is illustrated in Figure 3.8.

The network consists of seven layers and an input plane that contains a digit. It

has been normalized to 20 × 20 pixels and centered in the 32 × 32 frame. The input

intensities are scaled such that the white background becomes − 0 . 1 and the black

Fig. 3.8. Convolutional neural network LeNet-5 developed by LeCun et al. [134] for digit

recognition. The first layers compute an increasing number of feature maps with decreas-

ing resolution by convolution with 5 × 5 kernels and subsampling. At the higher layers, the

resolution drops to 1 × 1 and the weights are fully connected (image adapted from [134]).

Search WWH ::

Custom Search

Home