Hardware Reference
In-Depth Information
FIGURE 3.42 The amount of “wasted work” is plotted by taking the ratio of dispatched
micro-ops that do not graduate to all dispatched micro-ops . For example, the ratio is 25%
for sjeng, meaning that 25% of the dispatched and executed micro-ops are thrown away. The
data in this section were collected by Professor Lu Peng and Ph.D. student Ying Zhang, both
of Louisiana State University.
Notice that the wasted work in some cases closely matches the branch misprediction rates
shown in Figure 3.5 on page 167, but in several instances, such as mcf, the wasted work seems
relatively larger than the misprediction rate. In such cases, a likely explanation arises from the
memory behavior. With the very high data cache miss rates, mcf will dispatch many instruc-
tions during an incorrect speculation as long as sufficient reservation stations are available
for the stalled memory references. When the branch misprediction is detected, the micro-ops
corresponding to these instructions will be flushed, but there will be congestion around the
caches, as speculated memory references try to complete. There is no simple way for the pro-
cessor to halt such cache requests once they are initiated.
Figure 3.43 shows the overall CPI for the 19 SPECCPU2006 benchmarks. The integer bench-
marks have a CPI of 1.06 with very large variance (0.67 standard deviation). MCF and
OMNETPP are the major outliers, both having a CPI over 2.0 while most other benchmarks are
close to, or less than, 1.0 (gcc, the next highest, is 1.23). This variance derives from diferences
in the accuracy of branch prediction and in cache miss rates. For the integer benchmarks, the
L2 miss rate is the best predictor of CPI, and the L3 miss rate (which is very small) has almost
no effect.
 
Search WWH ::




Custom Search