Information Technology Reference
In-Depth Information
11.3 Future Work
Several interesting aspects are not covered in the thesis. They include implementa-
tion options, the use of more complex processing elements, and the integration of
the perception network into a complete system.
11.3.1 Implementation Options
The proposed Neural Abstraction Pyramid has been implemented on general-pur-
pose computers, PCs. While such computers are widely available, relatively inex-
pensive, and quite flexible, there are some drawbacks of such a choice as well. PCs
are too large for truly mobile applications, and the high operating frequencies used
cause a significant consumption of electric power.
Due to the mismatch between the proposed architecture and the structure of
today's PCs, the implementation of Neural Abstraction Pyramids with general-
purpose computers is inefficient. Even though the architecture is fully parallel and
the connectivity is local, PCs cannot take advantage of this since memory and pro-
cessing elements are separated. The key operation that determines the recall speed
of the network is the memory access to a weight and to the activity of its source,
followed by a multiply-accumulate. While the achieved speed of the current imple-
mentation is sufficient for the interpretation of low-resolution images in real-time,
a significant speedup would be needed to process high-resolution video. Even more
processing power is required for on-line learning and adaptation.
Several implementation options are available to improve the speed or to lower
the size/power requirements. All these options trade flexibility for efficiency. One
possibility is to utilize the SIMD instructions of modern processors. Pentium 4 pro-
cessors, for instance, offer MMX, SSE, and SSE2 instructions for parallel process-
ing of 8-bit, 16-bit, and 32-bit integers, as well as floats. Current XScale processors,
used in mobile devices, contain dedicated multiply-accumulate units, and Intel plans
to add extended MMX instructions to future XScale processors. Programming with
such SIMD instructions is less flexible since compiler support is limited, and the
algorithms must be adapted to match the capabilities of the SIMD engines. In par-
ticular, the memory access pattern must be tuned to produce streaming. If the SIMD
processing elements can be fully utilized, speed-up of an order of magnitude seems
to be possible compared to the current implementation.
An option to achieve greater speedup is to use parallel computers with multiple
CPUs. However, parallel computers are less available, larger, require more power,
and are more expensive than PCs. Furthermore, significant development effort is
necessary to distribute the processing efficiently between the CPUs.
If one restricts the power of individual processing elements, many of them can
be implemented on a single chip. The vision processor VIP128 [184] is an example
of such an approach. It contains a 2D array of processing elements that have ac-
cess to small local memories and to the data of their neighbors. Other examples of
special-purpose parallel processors are the XPACT data flow [19] architecture and
the Imagine stream processor [119]. Such parallel processors can achieve speedup
Search WWH ::




Custom Search