Systolic Arrays - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

for processor-to-processor communication with no external-glue logic. The com-

munication ports remove input/output bottlenecks, and the independent smart DMA

coprocessor is able to relieve the CPU input/output burden.

Each of the six serial communication ports is equipped with a 20 Mbytes/s

bidirectional interface, and separate input and output eight-word-deep FIFO buffers.

Direct processor-to-processor connection is supported by automatic arbitration and

handshaking. The DMA coprocessor allows concurrent I/O and CPU processing for

sustained CPU performance.

The processor features single-cycle 40-bit floating-point and 32-bit integer

multipliers, 512-byte instruction cache, and 8 K bytes of single-cycle dual-access

program or data RAM. It also contains separate internal program, data, and DMA

coprocessor buses for support of massive concurrent input/output (I/O) program and

data throughput.

The TMS 32040 is designed to support general-purpose parallel computation

with different configurations. With six bidirectional serial link ports, it would

directly support a hypercube configuration containing up to 2 6

64 processing

elements. It, of course, also can be easily configured to form a linear or two-

dimensional mesh-connected processor array to support systolic computing.

=

6

Recent Developments and Real World Applications

6.1

Block Motion Estimation

Block motion estimation is a critical computation step in every international

video coding standard, including MPEG-I, MPEG-II, MPEG-IV, H.261, H.263,

and H.264. This algorithm consists of a very simple loop body (sum of absolute

difference) embedded in a six-level nested loop. For real time, high definition video

encoding applications, the motion estimation operation must rely on special purpose

on-chip processor array structures that are heavily influenced by the systolic array

concept.

The notion of block motion estimation is demonstrated in Fig. 15 . Totheleft

of this figure is the current frame , which is to be encoded and transmitted from

the encoding end. To the right is the reference frame , which has already been

transmitted and reconstructed at the receiver end. The encoder will compute a copy

of this reconstructed reference frame for the purpose of motion estimation. Both

the current frame and the reference frame are divided into macro-blocks as shown

with dotted lines. Now focus on the current block , which is the shaded macro-block

at the second row and the fourth column of the current frame. The goal of motion

estimation is to find a matching macro-block in the reference frame, in the vicinity

of the location of the current block such that it resembles the current block in the

current frame. Usually, the current frame and the reference frame are separated by a

couple of frames temporally, and are likely to contain very similar scene. Hence,

Signal Processing Systems

Search WWH ::

Custom Search

Home