Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

3.1 Instruction-Level Parallelism: Concepts and

Challenges

All processors since about 1985 use pipelining to overlap the execution of instructions and im-

prove performance. This potential overlap among instructions is called instruction-level paral-

lelism (ILP), since the instructions can be evaluated in parallel. In this chapter and Appendix

H, we look at a wide range of techniques for extending the basic pipelining concepts by in-

creasing the amount of parallelism exploited among instructions.

This chapter is at a considerably more advanced level than the material on basic pipelining

in Appendix C . If you are not thoroughly familiar with the ideas in Appendix C , you should

review that appendix before venturing into this chapter.

We start this chapter by looking at the limitation imposed by data and control hazards and

then turn to the topic of increasing the ability of the compiler and the processor to exploit par-

allelism. These sections introduce a large number of concepts, which we build on throughout

this chapter and the next. While some of the more basic material in this chapter could be un-

derstood without all of the ideas in the first two sections, this basic material is important to

later sections of this chapter.

There are two largely separable approaches to exploiting ILP: (1) an approach that relies

on hardware to help discover and exploit the parallelism dynamically, and (2) an approach

that relies on software technology to find parallelism statically at compile time. Processors us-

ing the dynamic, hardware-based approach, including the Intel Core series, dominate in the

desktop and server markets. In the personal mobile device market, where energy efficiency is

often the key objective, designers exploit lower levels of instruction-level parallelism. Thus, in

2011, most processors for the PMD market use static approaches, as we will see in the ARM

Cortex-A8; however, future processors (e.g., the new ARM Cortex-A9) are using dynamic ap-

proaches. Aggressive compiler-based approaches have been atempted numerous times begin-

ning in the 1980s and most recently in the Intel Itanium series. Despite enormous efforts, such

approaches have not been successful outside of the narrow range of scientific applications.

In the past few years, many of the techniques developed for one approach have been ex-

ploited within a design relying primarily on the other. This chapter introduces the basic con-

cepts and both approaches. A discussion of the limitations on ILP approaches is included in

this chapter, and it was such limitations that directly led to the movement to multicore. Under-

standing the limitations remains important in balancing the use of ILP and thread-level paral-

lelism.

In this section, we discuss features of both programs and processors that limit the amount of

parallelism that can be exploited among instructions, as well as the critical mapping between

program structure and hardware structure, which is key to understanding whether a program

property will actually limit performance and under what circumstances.

The value of the CPI (cycles per instruction) for a pipelined processor is the sum of the base

CPI and all contributions from stalls:

Search WWH ::

Custom Search

Home