Fundamentals of Quantitative Design and Analysis - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

instructions in the set. Since the starting point is often individual instruction count and CPI

measurements, the processor performance equation is incredibly useful.

To use the processor performance equation as a design tool, we need to be able to measure

the various factors. For an existing processor, it is easy to obtain the execution time by meas-

urement, and we know the default clock speed. The challenge lies in discovering the instruc-

tion count or the CPI. Most new processors include counters for both instructions executed

and for clock cycles. By periodically monitoring these counters, it is also possible to attach ex-

ecution time and instruction count to segments of the code, which can be helpful to program-

mers trying to understand and tune the performance of an application. Often, a designer or

programmer will want to understand performance at a more fine-grained level than what is

available from the hardware counters. For example, they may want to know why the CPI is

what it is. In such cases, simulation techniques used are like those for processors that are being

designed.

Techniques that help with energy efficiency, such as dynamic voltage frequency scaling and

overclocking (see Section 1.5 ), make this equation harder to use, since the clock speed may

vary while we measure the program. A simple approach is to turn of those features to make

the results reproducible. Fortunately, as performance and energy efficiency are often highly

correlated—taking less time to run a program generally saves energy—it's probably safe to

consider performance without worrying about the impact of DVFS or overclocking on the res-

ults.

1.10 Putting It All Together: Performance, Price, and

Power

In the “Puting It All Together” sections that appear near the end of every chapter, we provide

real examples that use the principles in that chapter. In this section, we look at measures of

performance and power-performance in small servers using the SPECpower benchmark.

Figure 1.18 shows the three multiprocessor servers we are evaluating along with their price.

To keep the price comparison fair, all are Dell PowerEdge servers. The first is the PowerEdge

R710, which is based on the Intel Xeon X5670 microprocessor with a clock rate of 2.93 GHz.

Unlike the Intel Core i7 in Chapters 2 through 5 , which has 4 cores and an 8 MB L3 cache, this

Intel chip has 6 cores and a 12 MB L3 cache, although the cores themselves are identical. We

selected a two-socket system with 12 GB of ECC-protected 1333 MHz DDR3 DRAM. The next

server is the PowerEdge R815, which is based on the AMD Opteron 6174 microprocessor. A

chip has 6 cores and a 6 MB L3 cache, and it runs at 2.20 GHz, but AMD puts two of these chips

into a single socket. Thus, a socket has 12 cores and two 6 MB L3 caches. Our second server

has two sockets with 24 cores and 16 GB of ECC-protected 1333 MHz DDR3 DRAM, and our

third server (also a PowerEdge R815) has four sockets with 48 cores and 32 GB of DRAM. All

are running the IBM J9 JVM and the Microsoft Windows 2008 Server Enterprise x64 Edition

operating system.

Search WWH ::

Custom Search

Home