Information Technology Reference
In-Depth Information
FIGURE 4.34: Micro-Operation Cache (µC) in the P6 architecture. Traces are built as uops are issued
after the decode stage. Uop traces are delivered to the issue stage at the same time as the normal front-end
path would deliver them. From [ 210 ]. Copyright 2001 IEEE.
the uops are not delivered to the issue stage until after 4 more cycles (stages). This ensures that
there is no bubble in the pipeline switching back and forth from streaming uops out of the µ C
to fetching IA-32 instructions from the instruction cache and decoding them.
The benefits for often-repeating traces, of course, are significant. Solomon et al. report
that 75% of all instruction decoding (hence, uop translation) is eliminated using a moderately
sized micro-operation cache (e.g., 64 sets
6 uops/line). This translates to a
10% reduction of the processor's total power for the P6 architecture [ 210 ].
The Pentium-4 trace cache is a prime example of a power-saving technique eliminating
repetitive and cacheable computation (decoding) . But at the same time it is also a cache hierarchy
optimization similarly to the loop cache.
×
6 associativity
×
4.11 SPECULATIVE ACTIVITY
Speculative switching activity is a high-level type of switching activity relating to speculative exe-
cution. Wide superscalar processors need a constant supply of instructions not only to keep mul-
tiple functional units busy when this is feasible, but also to make forward progress in the face of
costly cache misses. Although there is significant instruction level parallelism in many programs,
we have come to a point where it is a struggle to maintain an IPC of 1 at the highest frequencies.
Branch prediction is a necessity in this situation. It provides for more independent instruc-
tions to keep the functional units busy until the next cache miss. However, even sophisticated
branch prediction may not be enough to avoid complete stalls [ 126 ]. Prediction, of course,
leads to speculation: instructions are executed speculatively until the correct execution path is
verified. Besides the actual power consumption overhead of supporting branch prediction and
speculative execution (e.g., prediction structures, support for checkpointing, increased run-time
state, etc.) there is also the issue of incorrect execution. Incorrect speculative execution that is
discarded when the branch is resolved is—for the most part—wasted switching activity. This
Search WWH ::




Custom Search