Hardware Reference
In-Depth Information
FIGURE 3.36 The basic structure of the A8 pipeline is 13 stages . Three cycles are used
for instruction fetch and four for instruction decode, in addition to a five-cycle integer pipeline.
This yields a 13-cycle branch misprediction penalty. The instruction fetch unit tries to keep the
12-entry instruction queue filled.
Energy consumption is determined by the combination of speedup and increase in power
consumption. For the Java benchmarks, on average, SMT delivers the same energy efficiency
as non-SMT (average of 1.0), but it is brought down by the two poor performing benchmarks;
without tradebeans and pjbb2005, the average energy efficiency for the Java benchmarks is
1.06, which is almost as good as the PARSEC benchmarks. In the PARSEC benchmarks, SMT
reduces energy by 1 − (1/1.08) = 7%. Such energy-reducing performance enhancements are very
diicult to find. Of course, the static power associated with SMT is paid in both cases, thus the
results probably slightly overstate the energy gains.
These results clearly show that SMT in an aggressive speculative processor with extensive
support for SMT can improve performance in an energy efficient fashion, which the more
aggressive ILP approaches have failed to do. In 2011, the balance between offering multiple
simpler cores and fewer more sophisticated cores has shifted in favor of more cores, with
each core typically being a three- to four-issue superscalar with SMT supporting two to four
threads. Indeed, Esmaeilzadeh et al. [2011] show that the energy improvements from SMT are
even larger on the Intel i5 (a processor similar to the i7, but with smaller caches and a lower
clock rate) and the Intel Atom (an 80×86 processor designed for the netbook market and de-
scribed in Section 3.14 ) .
3.13 Putting It All Together: The Intel Core i7 and ARM
Cortex-A8
In this section we explore the design of two multiple issue processors: the ARM Cortex-A8
core, which is used as the basis for the Apple A9 processor in the iPad, as well as the processor
in the Motorola Droid and the iPhones 3GS and 4, and the Intel Core i7, a high-end, dynamic-
ally scheduled, speculative processor, intended for high-end desktops and server applications.
We begin with the simpler processor.
 
Search WWH ::




Custom Search