Information Technology Reference
In-Depth Information
Check for parallel/speculative activity . Parallel speculative activities such as set-associative
cache access, parallel searches, indiscriminate snooping, etc., can be significantly re-
duced by performing first only what is most likely to succeed.
Check for repetitive/cacheable activity . Complex computation that repeats verbatim can
be possibly memoized—that is, stored as an association of the inputs to a specific
output. If retrieving the result consumes less energy than computing it, this method
can yield excellent improvements in power. Caches can also be cached. After all, the
cache hierarchy itself is a power optimization.
Check for Speculative activity at a large scale that is wasted on misspeculations. Find a
way to reduce the work performed during probable misspeculations.
Check for value-dependent activity . A different encoding can sometimes lead to a totally
different switching profile. Although we present this type of activity last, an encoding
change is a fundamental and fairly low-level optimization that can performed before
any of the above optimizations.
Most of low-power research and practice centers around these types of excess or avoidable
activity. As long as there is a consistent and persistent effort to systematically address each and
every one of these cases, the power inefficiency that we have seen, especially in out-of-order
superscalar processors, can be largely rectified.
Optimizing a component to reduce excess activity in some cases may have undesired
consequences. In these situations, a work steering strategy can be used where both the optimized
and the unoptimized versions of the component are provided; work is then dynamically steered
to the appropriate component according to run-time conditions.
Looking forward, one can view future CMP trends through the prism of their impact
on power. In particular, the effective capacitance clearly demonstrates the inefficiency of the
ILP “uni-core” approach. With each new and more complex ILP unicore, the core size has
increased, leading to longer wires on average. In addition, the activity factor also increases.
While performance has increased somewhat in return the marginal performance benefits have
been decreasing with each generation, while power and power density have been increasing.
In other words, each successive ILP architecture was less power-efficient for its perfor-
mance gains. The move to multi-core architectures, using a power-efficient core as a building
block, has many benefits. Among them, it allows the activity factor to be controlled on a per-
core basis, and allows average wire lengths to be primarily limited by core size, rather than die
size.
While multi-cores reduce global wiring, they do not eliminate it. In particular, on-chip
processors are interconnected either by a shared bus or by some sort of on-chip (perhaps
Search WWH ::




Custom Search