Hardware Reference
In-Depth Information
instructions in the set. Since the starting point is often individual instruction count and CPI
measurements, the processor performance equation is incredibly useful.
To use the processor performance equation as a design tool, we need to be able to measure
the various factors. For an existing processor, it is easy to obtain the execution time by meas-
urement, and we know the default clock speed. The challenge lies in discovering the instruc-
tion count or the CPI. Most new processors include counters for both instructions executed
and for clock cycles. By periodically monitoring these counters, it is also possible to attach ex-
ecution time and instruction count to segments of the code, which can be helpful to program-
mers trying to understand and tune the performance of an application. Often, a designer or
programmer will want to understand performance at a more fine-grained level than what is
available from the hardware counters. For example, they may want to know why the CPI is
what it is. In such cases, simulation techniques used are like those for processors that are being
designed.
Techniques that help with energy efficiency, such as dynamic voltage frequency scaling and
overclocking (see Section 1.5 ), make this equation harder to use, since the clock speed may
vary while we measure the program. A simple approach is to turn of those features to make
the results reproducible. Fortunately, as performance and energy efficiency are often highly
correlated—taking less time to run a program generally saves energy—it's probably safe to
consider performance without worrying about the impact of DVFS or overclocking on the res-
ults.
1.10 Putting It All Together: Performance, Price, and
Power
In the “Puting It All Together” sections that appear near the end of every chapter, we provide
real examples that use the principles in that chapter. In this section, we look at measures of
performance and power-performance in small servers using the SPECpower benchmark.
Figure 1.18 shows the three multiprocessor servers we are evaluating along with their price.
To keep the price comparison fair, all are Dell PowerEdge servers. The first is the PowerEdge
R710, which is based on the Intel Xeon X5670 microprocessor with a clock rate of 2.93 GHz.
Unlike the Intel Core i7 in Chapters 2 through 5 , which has 4 cores and an 8 MB L3 cache, this
Intel chip has 6 cores and a 12 MB L3 cache, although the cores themselves are identical. We
selected a two-socket system with 12 GB of ECC-protected 1333 MHz DDR3 DRAM. The next
server is the PowerEdge R815, which is based on the AMD Opteron 6174 microprocessor. A
chip has 6 cores and a 6 MB L3 cache, and it runs at 2.20 GHz, but AMD puts two of these chips
into a single socket. Thus, a socket has 12 cores and two 6 MB L3 caches. Our second server
has two sockets with 24 cores and 16 GB of ECC-protected 1333 MHz DDR3 DRAM, and our
third server (also a PowerEdge R815) has four sockets with 48 cores and 32 GB of DRAM. All
are running the IBM J9 JVM and the Microsoft Windows 2008 Server Enterprise x64 Edition
operating system.
 
Search WWH ::




Custom Search