Hardware Reference
In-Depth Information
3.15 Concluding Remarks: What's Ahead?
3.16 Historical Perspective and References
Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell
3.1 Instruction-Level Parallelism: Concepts and
Challenges
All processors since about 1985 use pipelining to overlap the execution of instructions and im-
prove performance. This potential overlap among instructions is called instruction-level paral-
lelism (ILP), since the instructions can be evaluated in parallel. In this chapter and Appendix
H, we look at a wide range of techniques for extending the basic pipelining concepts by in-
creasing the amount of parallelism exploited among instructions.
This chapter is at a considerably more advanced level than the material on basic pipelining
in Appendix C . If you are not thoroughly familiar with the ideas in Appendix C , you should
review that appendix before venturing into this chapter.
We start this chapter by looking at the limitation imposed by data and control hazards and
then turn to the topic of increasing the ability of the compiler and the processor to exploit par-
allelism. These sections introduce a large number of concepts, which we build on throughout
this chapter and the next. While some of the more basic material in this chapter could be un-
derstood without all of the ideas in the first two sections, this basic material is important to
later sections of this chapter.
There are two largely separable approaches to exploiting ILP: (1) an approach that relies
on hardware to help discover and exploit the parallelism dynamically, and (2) an approach
that relies on software technology to find parallelism statically at compile time. Processors us-
ing the dynamic, hardware-based approach, including the Intel Core series, dominate in the
desktop and server markets. In the personal mobile device market, where energy efficiency is
often the key objective, designers exploit lower levels of instruction-level parallelism. Thus, in
2011, most processors for the PMD market use static approaches, as we will see in the ARM
Cortex-A8; however, future processors (e.g., the new ARM Cortex-A9) are using dynamic ap-
proaches. Aggressive compiler-based approaches have been atempted numerous times begin-
ning in the 1980s and most recently in the Intel Itanium series. Despite enormous efforts, such
approaches have not been successful outside of the narrow range of scientific applications.
In the past few years, many of the techniques developed for one approach have been ex-
ploited within a design relying primarily on the other. This chapter introduces the basic con-
cepts and both approaches. A discussion of the limitations on ILP approaches is included in
this chapter, and it was such limitations that directly led to the movement to multicore. Under-
standing the limitations remains important in balancing the use of ILP and thread-level paral-
lelism.
In this section, we discuss features of both programs and processors that limit the amount of
parallelism that can be exploited among instructions, as well as the critical mapping between
program structure and hardware structure, which is key to understanding whether a program
property will actually limit performance and under what circumstances.
The value of the CPI (cycles per instruction) for a pipelined processor is the sum of the base
CPI and all contributions from stalls:
 
 
Search WWH ::




Custom Search