Optimizing Capacitance and Switching Activity to Reduce Dynamic Power - Computer Architecture Techniques for Power-Efficiency

Information Technology Reference

In-Depth Information

Multi-MRU

MRU

MMRU uses the log 2 ( N ) least-significant tag bits to select

an MRU table (here 2 bits for 4 tables):

Cache Ways

01 2 3

Cache Ways

01 2 3

tag bits

00

01

10

11

presense

vectors:

mru

0001

*11

*01 *00

*00

Sets

0001

0100

0010

1000

Sets

assoc

as s oc

assoc ways

bits

assoc

N MRU tables

way prediction is the MRU of the set(eg. way 3)

Way prediction for tag:

*00 is 3 (note the other “non-MRU” *00 tag in way 2)

*01 is 1 (tag *01 is in its DM position)

*10 is 2 (although no tag *10 is in the set)

*11 is 0

FIGURE 4.29: Multi-MRU way-predictor employs N MRU predictors (typically N

=

assoc )todisam-

biguate on few least-significant tag bits.

Powell et al. report that SDM combined with way prediction yields significant savings by

accessing mostly the direct-mapped or the predicted way. Despite some performance penalty

(less than 3%) due to mispredictions, the reduction in EDP is of the order of 64-69% for the

4-way 16KB instruction L1 and data L1, respectively. For their processor models the overall

reduction in EDP for this technique is 8%, while with perfect prediction is only 2% better

(10%) [ 183 ].

Multi-MRU :The multi-MRU ( MMRU) Zhang et al. proposal [ 242 ] (later also appearing

in Zhu et al. [ 249 ]) is also an extension of the most recently used ( MRU ) way-prediction [ 43 , 48 ].

MRU simply returns the most recently accessed way of a set as its prediction (Figure 4.29, left

diagram) but MMRU allows multiple MRU predictors to disambiguate among tags (Figure 4.29,

right diagram). All tags in a set having the same low-order bits are tracked by the same MRU

table. For example, in Figure 4.29, two tags ending in 00 are tracked by the leftmost MRU

table. The prediction is the cache-way of the MRU tag among them (e.g., way 3 in Figure 4.29).

In theory, MMRU can disambiguate any number of tag bits, but in practice the technique is

limited by the cost of the MRU tables.

It is interesting to note that according to the published results, MMRU is about equal

in predictive power to selective direct-mapping when log 2 (associativity) tag bits (i.e., as many

MRU tables as the associativity of the cache) are used. In terms of predictive power, SDM

aims to place as many lines as it can in their direct-mapped positions and handle the rest

with a way-predictor. MMRU tracks all such lines, both those in their direct-mapped position

and those in set-associative positions, yielding approximately the same prediction accuracy—an

average of 92% first probe hits for 4-way caches [ 183 , 242 , 249 ].

A weakness in all the way prediction techniques mentioned so far is that they do not

do well on misses. MRU, MMRU, and SDM incur the maximum latency and energy just to

Computer Architecture Techniques for Power-Efficiency

Search WWH ::

Custom Search

Home