Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Every iteration of the loop can overlap with any other iteration, although within each loop

iteration there is litle or no opportunity for overlap.

We will examine a number of techniques for converting such loop-level parallelism into

instruction-level parallelism. Basically, such techniques work by unrolling the loop either stat-

ically by the compiler (as in the next section) or dynamically by the hardware (as in Sections

3.5 and 3.6 ) .

An important alternative method for exploiting loop-level parallelism is the use of SIMD in

both vector processors and Graphics Processing Units (GPUs), both of which are covered in

Chapter 4 . A SIMD instruction exploits data-level parallelism by operating on a small to mod-

erate number of data items in parallel (typically two to eight). A vector instruction exploits

data-level parallelism by operating on many data items in parallel using both parallel execu-

tion units and a deep pipeline. For example, the above code sequence, which in simple form

requires seven instructions per iteration (two loads, an add, a store, two address updates, and

a branch) for a total of 7000 instructions, might execute in one-quarter as many instructions in

some SIMD architecture where four data items are processed per instruction. On some vector

processors, this sequence might take only four instructions: two instructions to load the vec-

tors x and y from memory, one instruction to add the two vectors, and an instruction to store

back the result vector. Of course, these instructions would be pipelined and have relatively

long latencies, but these latencies may be overlapped.

Data Dependences And Hazards

Determining how one instruction depends on another is critical to determining how much

parallelism exists in a program and how that parallelism can be exploited. In particular, to ex-

ploit instruction-level parallelism we must determine which instructions can be executed in

parallel. If two instructions are parallel , they can execute simultaneously in a pipeline of ar-

bitrary depth without causing any stalls, assuming the pipeline has sufficient resources (and

hence no structural hazards exist). If two instructions are dependent, they are not parallel and

must be executed in order, although they may often be partially overlapped. The key in both

cases is to determine whether an instruction is dependent on another instruction.

Data Dependences

There are three different types of dependences: data dependences (also called true data depend-

ences), name dependences , and control dependences . An instruction j is data dependent on instruc-

tion i if either of the following holds:

■ Instruction i produces a result that may be used by instruction j .

■ Instruction j is data dependent on instruction k , and instruction k is data dependent on in-

struction i .

The second condition simply states that one instruction is dependent on another if there exists

a chain of dependences of the first type between the two instructions. This dependence chain

can be as long as the entire program. Note that a dependence within a single instruction (such

as ADDD R1,R1,R1 ) is not considered a dependence.

For example, consider the following MIPS code sequence that increments a vector of values

in memory (starting at 0(R1) and with the last element at 8(R2) ) by a scalar in register F2 . (For

simplicity, throughout this chapter, our examples ignore the effects of delayed branches.)

Loop:

L.D

F0,0(R1)

;F0=array element

ADD.D

F4,F0,F2

;add scalar in F2

S.D

F4,0(R1)

;store result

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home