DSP Systems Using Three-Dimensional Integration Technology - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

candidate MB data differently. As a result, the sub-arrays within the same sub-bank

may not be able to fully share all the row and column address and each sub-array

musthaveadataI/Owidthof N

D bits.

Another advantage of this bit-plane storage strategy is that it can easily support

run-time graceful performance vs. energy trade-off, because the bit-plane structure

makes it very easy to adjust the precision of luminance intensity data participating

in motion estimation. It is well-known that appropriate pixel truncation [ 27 ] can

lead to substantial reduction on computational complexity and power consumption

without significantly affecting image quality. Such bit-plane memory structure can

naturally support dynamic pixel truncation, which can meanwhile reduce the power

consumption of memory data access. Given the D -bit full precision of luminance

intensity data, if we only use D r <

·

D bits in motion estimation, we can directly

switch the D

D r bits for each pixel,

into an idle mode to reduce the DRAM power consumption. It is intuitive that

such lower-precision operation can be dynamically adjusted to allow more flexible

performance vs. energy trade-off, e.g., we could first use low-precision data to

calculate coarse SADs, and then run block matching with full precision in a small

region around the candidate MB with the least coarse SAD.

It should be pointed out that, unlike conventional design solutions, the above pre-

sented design strategy under the 3D logic-DRAM integrated system framework can

realize any arbitrary and discontinuous motion vector search and hence seamlessly

support most existing motion estimation algorithms. Finally, we note that, although

the above discussion only focuses on data storage for motion estimation, the same

DRAM storage approaches can be used to facilitate the motion compensation as

well in both video encoders and decoders.

−

D r sub-arrays, which store the lower D

−

6.2.2

Motion Estimation Memory Access

With the above DRAM architecture design strategy, the motion estimation engine

on the logic die can access the 3D DRAM to directly fetch the current MB and

candidate MB through a simple interface. Assume that the video encoder should

support a multi-frame motion estimation with up-to m reference frames. In order

to seamlessly support multi-frame motion estimation while maintaining the same

video encoding throughput, we store all these m reference frames separately, each

reference frame is stored in two banks. The motion estimation engine can access all

the m reference frames simultaneously, i.e., the motion estimation engine contains m

parallel SAD computation units, each unit carries out motion estimation based upon

one reference frame. We denote the MB at the top-left corner of each frame with a

2D position index of (0, 0). Assuming that each frame contains F W ×

F H MBs, the

MB at the bottom-right corner of each frame has a 2D position index of ( F W −

1,

F H −

1). Assuming that each word-line in one bank stores s MBs, we store all the

MBs row-by-row. Hence, given the MB index

, we first can identify its bank

index as x %2, where % is the modulo operator that finds the remainder of division.

Then we can derive the corresponding DRAM row address as

(

x

,

y

)

Signal Processing Systems

Search WWH ::

Custom Search

Home