Hardware Reference
In-Depth Information
The Intel Core i7 supports hardware prefetching into both L1 and L2 with the most common
case of prefetching being accessing the next line. Some earlier Intel processors used more ag-
gressive hardware prefetching, but that resulted in reduced performance for some applica-
tions, causing some sophisticated users to turn of the capability.
Figure 2.10 shows the overall performance improvement for a subset of SPEC2000 programs
when hardware prefetching is turned on. Note that this figure includes only 2 of 12 integer
programs, while it includes the majority of the SPEC floating-point programs.
FIGURE 2.10 Speedup due to hardware prefetching on Intel Pentium 4 with hardware
prefetching turned on for 2 of 12 SPECint2000 benchmarks and 9 of 14 SPECfp2000
benchmarks . Only the programs that benefit the most from prefetching are shown; prefetch-
ing speeds up the missing 15 SPEC benchmarks by less than 15% [Singhal 2004].
Prefetching relies on utilizing memory bandwidth that otherwise would be unused, but if
it interferes with demand misses it can actually lower performance. Help from compilers can
reduce useless prefetching. When prefetching works well its impact on power is negligible.
When prefetched data are not used or useful data are displaced, prefetching will have a very
negative impact on power.
Tenth Optimization: Compiler-Controlled Prefetching To
Reduce Miss Penalty Or Miss Rate
An alternative to hardware prefetching is for the compiler to insert prefetch instructions to re-
quest data before the processor needs it. There are two flavors of prefetch:
Register prefetch will load the value into a register.
Cache prefetch loads data only into the cache and not the register.
 
Search WWH ::




Custom Search