Digital Signal Processing Reference
In-Depth Information
Do h=0 to N h -1
Do v=0 to N v -1
MV(h,v)=(0,0)
Dmin(h,v)=
Do m=-p to p
Do n=-p to p
MAD(m,n)=0
Do i=h*N to (h+1)*N-1
Do j=v*N to (v+1)*N-1
MAD(m,n)= MAD(m,n)+|x(i,j)-y(i+m,j+n)|
End do j
End do i
If Dmin(h,v) > MAD(m,n)
Dmin(h,v)=MAD(m,n)
MV(h,v)=(m,n)
End if
End do n
End do m
End do v
End do h
Fig. 16
Full search block matching motion estimation
1) 2
N 2
10 10
need to be performed would be around 30
×
N h ×
N v ×
(2 p
+
×
1.8
×
operation/s.
Since motion estimation is only part of video encoding operations, an application
specific hardware module would be a desirable implementation option. In view
of the regularity of the loop-nest formulation, and the simplicity of the loop-
body operations (addition/subtraction), a systolic array solution is a natural choice.
Toward this direction, numerous motion estimation processor array structures have
been proposed, including 2D mesh array, 1D linear array, tree-structured array, and
hybrid structures. Some of these realizations focused on the inner four-level nested
loop formulation in Fig. 16 [ 12 , 20 ] , and some took the entire six-level loop nest
into accounts [ 5 , 11 , 27 ] . An example is shown in Fig. 17 . In this configuration, the
search area pixel y is broadcast to each processing elements in the same column;
and current frame pixel x is propagated along the spiral interconnection links. The
constraint of N
2 p is imposed to achieve low input/output pin count. A simple PE
is composed of only two eight-bit adders and a comparator as shown in Fig. 18 .
A number of video encoders micro-chips including motion estimation have been
reported over the years. Earlier motion estimation architectures often use some
variants of a pixel-based systolic array to evaluate the MAD operations. Often a fast
search algorithm is used in lieu of the full search algorithm due to speed and power
consumption concerns. One example is an MPEG-IV Standard profile encoder chip
reported in [ 17 ] . Some chip characteristics are given in Table 1 .
AsshowninFig. 19 , the motion estimation is carried out with 16 adder tree
(processing units, PU) for sum of absolute difference calculation and the motion
vectors are selected based on these results. A chip micro-graph is depicted in Fig. 20 .
=
 
Search WWH ::




Custom Search