Digital Signal Processing Reference
In-Depth Information
candidate MB data differently. As a result, the sub-arrays within the same sub-bank
may not be able to fully share all the row and column address and each sub-array
musthaveadataI/Owidthof N
D bits.
Another advantage of this bit-plane storage strategy is that it can easily support
run-time graceful performance vs. energy trade-off, because the bit-plane structure
makes it very easy to adjust the precision of luminance intensity data participating
in motion estimation. It is well-known that appropriate pixel truncation [ 27 ] can
lead to substantial reduction on computational complexity and power consumption
without significantly affecting image quality. Such bit-plane memory structure can
naturally support dynamic pixel truncation, which can meanwhile reduce the power
consumption of memory data access. Given the D -bit full precision of luminance
intensity data, if we only use D r <
·
D bits in motion estimation, we can directly
switch the D
D r bits for each pixel,
into an idle mode to reduce the DRAM power consumption. It is intuitive that
such lower-precision operation can be dynamically adjusted to allow more flexible
performance vs. energy trade-off, e.g., we could first use low-precision data to
calculate coarse SADs, and then run block matching with full precision in a small
region around the candidate MB with the least coarse SAD.
It should be pointed out that, unlike conventional design solutions, the above pre-
sented design strategy under the 3D logic-DRAM integrated system framework can
realize any arbitrary and discontinuous motion vector search and hence seamlessly
support most existing motion estimation algorithms. Finally, we note that, although
the above discussion only focuses on data storage for motion estimation, the same
DRAM storage approaches can be used to facilitate the motion compensation as
well in both video encoders and decoders.
D r sub-arrays, which store the lower D
6.2.2
Motion Estimation Memory Access
With the above DRAM architecture design strategy, the motion estimation engine
on the logic die can access the 3D DRAM to directly fetch the current MB and
candidate MB through a simple interface. Assume that the video encoder should
support a multi-frame motion estimation with up-to m reference frames. In order
to seamlessly support multi-frame motion estimation while maintaining the same
video encoding throughput, we store all these m reference frames separately, each
reference frame is stored in two banks. The motion estimation engine can access all
the m reference frames simultaneously, i.e., the motion estimation engine contains m
parallel SAD computation units, each unit carries out motion estimation based upon
one reference frame. We denote the MB at the top-left corner of each frame with a
2D position index of (0, 0). Assuming that each frame contains F W ×
F H MBs, the
MB at the bottom-right corner of each frame has a 2D position index of ( F W
1,
F H
1). Assuming that each word-line in one bank stores s MBs, we store all the
MBs row-by-row. Hence, given the MB index
, we first can identify its bank
index as x %2, where % is the modulo operator that finds the remainder of division.
Then we can derive the corresponding DRAM row address as
(
x
,
y
)
Search WWH ::




Custom Search