Hardware Reference
In-Depth Information
stead of OOO issue in a model of the 21264. Make sure the other aspects of the machine are
as similar as possible to make the comparison fair. Ignore any increase in access or cycle
time from larger caches and effects of the larger data cache on the floorplan of the chip.
(Note that this comparison will not be totally fair, as the code will not have been scheduled
for the in-order processor by the compiler.)
FIGURE 2.33 Floorplan of the Alpha 21264 [ Kessler 1999 ] .
2.25 [20/20/20] <2.6> The Intel performance analyzer VTune can be used to make many meas-
urements of cache behavior. A free evaluation version of VTune on both Windows and
Linux can be downloaded from htp://software.intel.com/en-us/articles/intel-vtune-ampliier-
xe/ . The program ( aca_ch2_cs2.c ) used in Case Study 2 has been modified so that it can work
with VTune out of the box on Microsoft Visual C++. The program can be downloaded from
www.hpl.hp.com/research/cacti/aca_ch2_cs2_vtune.c . Special VTune functions have been inser-
ted to exclude initialization and loop overhead during the performance analysis process.
Detailed VTune setup directions are given in the README section in the program. The
program keeps looping for 20 seconds for every configuration. In the following experiment
you can find the effects of data size on cache and overall processor performance. Run the
program in VTune on an Intel processor with the input dataset sizes of 8 KB, 128 KB, 4 MB,
and 32 MB, and keep a stride of 64 bytes (stride one cache line on Intel i7 processors). Col-
 
Search WWH ::




Custom Search