Hardware Reference
In-Depth Information
Every iteration of the loop can overlap with any other iteration, although within each loop
iteration there is litle or no opportunity for overlap.
We will examine a number of techniques for converting such loop-level parallelism into
instruction-level parallelism. Basically, such techniques work by unrolling the loop either stat-
ically by the compiler (as in the next section) or dynamically by the hardware (as in Sections
3.5 and 3.6 ) .
An important alternative method for exploiting loop-level parallelism is the use of SIMD in
both vector processors and Graphics Processing Units (GPUs), both of which are covered in
Chapter 4 . A SIMD instruction exploits data-level parallelism by operating on a small to mod-
erate number of data items in parallel (typically two to eight). A vector instruction exploits
data-level parallelism by operating on many data items in parallel using both parallel execu-
tion units and a deep pipeline. For example, the above code sequence, which in simple form
requires seven instructions per iteration (two loads, an add, a store, two address updates, and
a branch) for a total of 7000 instructions, might execute in one-quarter as many instructions in
some SIMD architecture where four data items are processed per instruction. On some vector
processors, this sequence might take only four instructions: two instructions to load the vec-
tors x and y from memory, one instruction to add the two vectors, and an instruction to store
back the result vector. Of course, these instructions would be pipelined and have relatively
long latencies, but these latencies may be overlapped.
Data Dependences And Hazards
Determining how one instruction depends on another is critical to determining how much
parallelism exists in a program and how that parallelism can be exploited. In particular, to ex-
ploit instruction-level parallelism we must determine which instructions can be executed in
parallel. If two instructions are parallel , they can execute simultaneously in a pipeline of ar-
bitrary depth without causing any stalls, assuming the pipeline has sufficient resources (and
hence no structural hazards exist). If two instructions are dependent, they are not parallel and
must be executed in order, although they may often be partially overlapped. The key in both
cases is to determine whether an instruction is dependent on another instruction.
Data Dependences
There are three different types of dependences: data dependences (also called true data depend-
ences), name dependences , and control dependences . An instruction j is data dependent on instruc-
tion i if either of the following holds:
■ Instruction i produces a result that may be used by instruction j .
■ Instruction j is data dependent on instruction k , and instruction k is data dependent on in-
struction i .
The second condition simply states that one instruction is dependent on another if there exists
a chain of dependences of the first type between the two instructions. This dependence chain
can be as long as the entire program. Note that a dependence within a single instruction (such
as ADDD R1,R1,R1 ) is not considered a dependence.
For example, consider the following MIPS code sequence that increments a vector of values
in memory (starting at 0(R1) and with the last element at 8(R2) ) by a scalar in register F2 . (For
simplicity, throughout this chapter, our examples ignore the effects of delayed branches.)
Loop:
L.D
F0,0(R1)
;F0=array element
ADD.D
F4,F0,F2
;add scalar in F2
S.D
F4,0(R1)
;store result
Search WWH ::




Custom Search