Systolic Arrays - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Do h=0 to N h -1

Do v=0 to N v -1

MV(h,v)=(0,0)

Dmin(h,v)=

Do m=-p to p

Do n=-p to p

MAD(m,n)=0

Do i=h*N to (h+1)*N-1

Do j=v*N to (v+1)*N-1

MAD(m,n)= MAD(m,n)+|x(i,j)-y(i+m,j+n)|

End do j

End do i

If Dmin(h,v) > MAD(m,n)

Dmin(h,v)=MAD(m,n)

MV(h,v)=(m,n)

End if

End do n

End do m

End do v

End do h

Fig. 16

Full search block matching motion estimation

1) 2

N 2

10 10

need to be performed would be around 30

×

N h ×

N v ×

(2 p

+

×

≈

1.8

×

operation/s.

Since motion estimation is only part of video encoding operations, an application

specific hardware module would be a desirable implementation option. In view

of the regularity of the loop-nest formulation, and the simplicity of the loop-

body operations (addition/subtraction), a systolic array solution is a natural choice.

Toward this direction, numerous motion estimation processor array structures have

been proposed, including 2D mesh array, 1D linear array, tree-structured array, and

hybrid structures. Some of these realizations focused on the inner four-level nested

loop formulation in Fig. 16 [ 12 , 20 ] , and some took the entire six-level loop nest

into accounts [ 5 , 11 , 27 ] . An example is shown in Fig. 17 . In this configuration, the

search area pixel y is broadcast to each processing elements in the same column;

and current frame pixel x is propagated along the spiral interconnection links. The

constraint of N

2 p is imposed to achieve low input/output pin count. A simple PE

is composed of only two eight-bit adders and a comparator as shown in Fig. 18 .

A number of video encoders micro-chips including motion estimation have been

reported over the years. Earlier motion estimation architectures often use some

variants of a pixel-based systolic array to evaluate the MAD operations. Often a fast

search algorithm is used in lieu of the full search algorithm due to speed and power

consumption concerns. One example is an MPEG-IV Standard profile encoder chip

reported in [ 17 ] . Some chip characteristics are given in Table 1 .

AsshowninFig. 19 , the motion estimation is carried out with 16 adder tree

(processing units, PU) for sum of absolute difference calculation and the motion

vectors are selected based on these results. A chip micro-graph is depicted in Fig. 20 .

=

Signal Processing Systems

Search WWH ::

Custom Search

Home