Information Technology Reference
In-Depth Information
misses that go on the bus and remembering when the local cache does not have a copy of the data.
The exclude-Jetty in this case says “I've seen this before and I am sure its not locally cached.”
The include-Jetty, on the other hand, is a Bloom filter and captures a superset of what
is cached in the local L2. Bloom filters proposed in 1970 by Bloom [ 30 ] are hash tables
that implement a “ non-membership ” function. Because they can be efficiently implemented in
hardware, they are a convenient tool in many situations that require filtering [ 171 , 202 , 180 , 64 ].
Each Bloom filter entry is a single bit: a 1 denotes the presence of one or more objects
hashed on this entry; a 0 denotes the absence of any object that can hash on this entry . A Bloom
filter can tell us with certainty when something is not present , but it cannot tell what exactly is
present because of possible conflicts on its entries. One can arbitrarily lower the probability of
conflicts by using multiple hash functions for each inserted object. In this case, an object hashes
to multiple Bloom entries and all of them have to be 1 for the object to be present—if any of
the entries corresponding to an object are 0, the object is definitely not present.
The include-Jetty can say with certainty that some addresses are not locally cached (if
they fail to hit in the Bloom filter Jetty), while other addresses (that hit) may be cached locally.
For the latter, the snoop proceeds to access the L2 tags to make sure.
Finally, the third approach, the hybrid-Jetty, consults both the include-Jetty and exclude-
Jetty for higher efficiency. Moshovos et al. found that 54% of all the snoops miss in the L2
tags in a 4-processor SMP server for the SPLASH-2 benchmark suite. The best Jetty (hybrid-
Jetty) eliminates about three quarters (76%) of these snoops yielding analogous power savings.
Because the Jettys themselves are tiny compared to the tag arrays of an L2, their operation adds
little overhead.
4.10 CACHEABLE SWITCHING ACTIVITY
An important type of switching activity that can be “avoided” to reduce power is repetitive
computing activity. In reality, it is not eliminated but converted to caching activity. This is
achieved by storing the results of the computation and recognizing when it repeats verbatim
producing the same results as before. Instead of re-executing it, a lookup in a cache supplies
the results. This can save considerable power if the difference in energy between accessing
the cache and re-computing the results is quite large. It is possibly enlightening to consider
the cache hierarchy as a recursive application of this concept, only, instead of computation,
what is cached in this case is cache activity itself (reads and writes) from a lower—hence, more
expensive—level of the hierarchy.
Computation : Repetitive computation when executing a program appears at many levels:
at the functional unit (e.g., a multiplier fed by the same inputs), at the instruction level (e.g.,
the same repeating instruction [ 208 ]), at the basic block level (repeating basic blocks such as
loop iterations [ 56 ]) and at the trace level (groups of instructions in execution order). Such
computation, when used with the exact same inputs, produces the same result and therefore can
Search WWH ::




Custom Search