Graphics Reference
In-Depth Information
10.6.3
Hit Rate Analysis, DRAM Bandwidth and Power
The rate at which data can be accessed from the DRAM depends on two factors:
the number of bits that the DRAM interface can (theoretically) transfer per unit
time and the precharge latency caused by the interaction between requests. The
precharge latency can be normalized to bandwidth by multiplying with the bitwidth.
This normalized figure (called ACT BW) is the bandwidth lost in the precharge
and activate cycles—the amount of data that could have been transferred in the
cycles when the DRAM was executing row change operation. The other figure,
Data BW, refers to the amount of data that needs to be transferred from the DRAM
to the decoder per unit time for real-time operation. Thus, a better hit-rate reduces
the Data BW and a better memory map reduces the ACT BW. The advantage of
defining Data BW and ACT BW as mentioned above is that (Data BW C ACT BW)
is the minimum bandwidth required at the memory interface to support real-time
operation.
The performance of the cache and the twisted address mapping is compared
with two reference scenarios: raster-scan address mapping with no cache and raster
scan address mapping with the cache. As seen in Fig. 10.18 a, using a 16 kB cache
reduces the Data BW by 55 %. The Twisted 2D mapping reduces ACT BW by 71 %.
Thus, the cache results in a 67 % reduction of the total DRAM bandwidth. Using a
simplified power consumption model [ 14 ] based on the number of accesses, this
cache is found to save up to 112 mW, a 41 % reduction in DRAM access power as
shown in Fig. 10.18 b.
Figure 10.18 c compares the DRAM bandwidth across various encoder settings.
Smaller CTU sizes result in a larger bandwidth because of lower hit-rates. Thus,
larger CTU sizes such 64 can provide smaller external bandwidth at the cost of
higher on-chip complexity. Also, Random Access mode typically has lower hit rate
when compared to Low Delay. This behavior is expected because the reference
pictures are switched more frequently in the former.
10.6.4
Implementation Results
This design is synthesized at 200 MHz in 40 nm CMOS. The total area is 90.4 kgate
of logic and 16 kB (or 131.1 kbit) of SRAM. The bulk of the logic area is taken by
the 8,960 bit tag register file and can be replaced by a 2-port SRAM (which is denser
than register file) at the cost of an extra access cycle. Breakdown of the logic area is
presented in Table 10.11 .
Search WWH ::




Custom Search