Hardware Reference
In-Depth Information
FIGURE 5.30 This chart shows the speedup for two- and four-core executions of the
parallel Java and PARSEC workloads without SMT . These data were collected by Es-
maeilzadeh et al. [2011] using the same setup as described in Chapter 3 . Turbo Boost is
turned off. The speedup and energy efficiency are summarized using harmonic mean, imply-
ing a workload where the total time spent running each 2p benchmark is equivalent.
As the igure shows, the PARSEC benchmarks get beter speedup than the Java benchmarks,
achieving 76% speedup efficiency (i.e., actual speedup divided by processor count) on four
cores, while the Java benchmarks achieve 67% speedup efficiency on four cores. Although
this observation is clear from the data, analyzing why this difference exists is difficult. For ex-
ample, it is quite possible that Amdahl's law effects have reduced the speedup for the Java
workload. In addition, interaction between the processor architecture and the application,
which affects issues such as the cost of synchronization or communication, may also play a
role. In particular, well-parallelized applications, such as those in PARSEC, sometimes beneit
from an advantageous ratio between computation and communication, which reduces the de-
pendence on communications costs. (See Appendix I.)
These differences in speedup translate to differences in energy efficiency. For example, the
PARSEC benchmarks actually slightly improve energy efficiency over the single-core version;
this result may be significantly affected by the fact that the L3 cache is more effectively used
in the multicore runs than in the single-core case and the energy cost is identical in both cases.
Thus, for the PARSEC benchmarks, the multicore approach achieves what designers hoped
for when they switched from an ILP-focused design to a multicore design; namely, it scales
performance as fast or faster than scaling power, resulting in constant or even improved en-
ergy efficiency. In the Java case, we see that neither the two- or four-core runs break even in
energy efficiency due to the lower speedup levels of the Java workload (although Java energy
efficiency for the 2p run is the same as for PARSEC!). The energy efficiency in the four-core
Java case is reasonably high (0.94). It is likely that an ILP-centric processor would need even
more power to achieve a comparable speedup on either the PARSEC or Java workload. Thus,
 
Search WWH ::




Custom Search