Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 3.42 The amount of “wasted work” is plotted by taking the ratio of dispatched

micro-ops that do not graduate to all dispatched micro-ops . For example, the ratio is 25%

for sjeng, meaning that 25% of the dispatched and executed micro-ops are thrown away. The

data in this section were collected by Professor Lu Peng and Ph.D. student Ying Zhang, both

of Louisiana State University.

Notice that the wasted work in some cases closely matches the branch misprediction rates

shown in Figure 3.5 on page 167, but in several instances, such as mcf, the wasted work seems

relatively larger than the misprediction rate. In such cases, a likely explanation arises from the

memory behavior. With the very high data cache miss rates, mcf will dispatch many instruc-

tions during an incorrect speculation as long as sufficient reservation stations are available

for the stalled memory references. When the branch misprediction is detected, the micro-ops

corresponding to these instructions will be flushed, but there will be congestion around the

caches, as speculated memory references try to complete. There is no simple way for the pro-

cessor to halt such cache requests once they are initiated.

Figure 3.43 shows the overall CPI for the 19 SPECCPU2006 benchmarks. The integer bench-

marks have a CPI of 1.06 with very large variance (0.67 standard deviation). MCF and

OMNETPP are the major outliers, both having a CPI over 2.0 while most other benchmarks are

close to, or less than, 1.0 (gcc, the next highest, is 1.23). This variance derives from diferences

in the accuracy of branch prediction and in cache miss rates. For the integer benchmarks, the

L2 miss rate is the best predictor of CPI, and the L3 miss rate (which is very small) has almost

no effect.

Search WWH ::

Custom Search

Home