Summary and Conclusions - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

11.3 Future Work

Several interesting aspects are not covered in the thesis. They include implementa-

tion options, the use of more complex processing elements, and the integration of

the perception network into a complete system.

11.3.1 Implementation Options

The proposed Neural Abstraction Pyramid has been implemented on general-pur-

pose computers, PCs. While such computers are widely available, relatively inex-

pensive, and quite flexible, there are some drawbacks of such a choice as well. PCs

are too large for truly mobile applications, and the high operating frequencies used

cause a significant consumption of electric power.

Due to the mismatch between the proposed architecture and the structure of

today's PCs, the implementation of Neural Abstraction Pyramids with general-

purpose computers is inefficient. Even though the architecture is fully parallel and

the connectivity is local, PCs cannot take advantage of this since memory and pro-

cessing elements are separated. The key operation that determines the recall speed

of the network is the memory access to a weight and to the activity of its source,

followed by a multiply-accumulate. While the achieved speed of the current imple-

mentation is sufficient for the interpretation of low-resolution images in real-time,

a significant speedup would be needed to process high-resolution video. Even more

processing power is required for on-line learning and adaptation.

Several implementation options are available to improve the speed or to lower

the size/power requirements. All these options trade flexibility for efficiency. One

possibility is to utilize the SIMD instructions of modern processors. Pentium 4 pro-

cessors, for instance, offer MMX, SSE, and SSE2 instructions for parallel process-

ing of 8-bit, 16-bit, and 32-bit integers, as well as floats. Current XScale processors,

used in mobile devices, contain dedicated multiply-accumulate units, and Intel plans

to add extended MMX instructions to future XScale processors. Programming with

such SIMD instructions is less flexible since compiler support is limited, and the

algorithms must be adapted to match the capabilities of the SIMD engines. In par-

ticular, the memory access pattern must be tuned to produce streaming. If the SIMD

processing elements can be fully utilized, speed-up of an order of magnitude seems

to be possible compared to the current implementation.

An option to achieve greater speedup is to use parallel computers with multiple

CPUs. However, parallel computers are less available, larger, require more power,

and are more expensive than PCs. Furthermore, significant development effort is

necessary to distribute the processing efficiently between the CPUs.

If one restricts the power of individual processing elements, many of them can

be implemented on a single chip. The vision processor VIP128 [184] is an example

of such an approach. It contains a 2D array of processing elements that have ac-

cess to small local memories and to the data of their neighbors. Other examples of

special-purpose parallel processors are the XPACT data flow [19] architecture and

the Imagine stream processor [119]. Such parallel processors can achieve speedup

Search WWH ::

Custom Search

Home