Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

BEQZ R12,skip

DSUBU R4,R5,R6

DADDU R5,R4,R9

skip: OR R7,R8,R9

Suppose we knew that the register destination of the DSUBU instruction ( R4 ) was unused after

the instruction labeled skip . (The property of whether a value will be used by an upcoming

instruction is called liveness .) If R4 were unused, then changing the value of R4 just before the

branch would not affect the data flow since R4 would be dead (rather than live) in the code re-

gion after skip . Thus, if R4 were dead and the existing DSUBU instruction could not generate an

exception (other than those from which the processor resumes the same process), we could

move the DSUBU instruction before the branch, since the data flow cannot be affected by this

change.

If the branch is taken, the DSUBU instruction will execute and will be useless, but it will not af-

fect the program results. This type of code scheduling is also a form of speculation, often called

software speculation, since the compiler is beting on the branch outcome; in this case, the bet

is that the branch is usually not taken. More ambitious compiler speculation mechanisms are

discussed in Appendix H. Normally, it will be clear when we say speculation or speculative

whether the mechanism is a hardware or software mechanism; when it is not clear, it is best to

say “hardware speculation” or “software speculation.”

Control dependence is preserved by implementing control hazard detection that causes con-

trol stalls. Control stalls can be eliminated or reduced by a variety of hardware and software

techniques, which we examine in Section 3.3 .

3.2 Basic Compiler Techniques for Exposing ILP

This section examines the use of simple compiler technology to enhance a processor's ability

to exploit ILP. These techniques are crucial for processors that use static issue or static schedul-

ing. Armed with this compiler technology, we will shortly examine the design and per-

formance of processors using static issuing. Appendix H will investigate more sophisticated

compiler and associated hardware schemes designed to enable a processor to exploit more

instruction-level parallelism.

Basic Pipeline Scheduling And Loop Unrolling

To keep a pipeline full, parallelism among instructions must be exploited by finding sequences

of unrelated instructions that can be overlapped in the pipeline. To avoid a pipeline stall, the

execution of a dependent instruction must be separated from the source instruction by a dis-

tance in clock cycles equal to the pipeline latency of that source instruction. A compiler's abil-

ity to perform this scheduling depends both on the amount of ILP available in the program

and on the latencies of the functional units in the pipeline. Figure 3.2 shows the FP unit laten-

cies we assume in this chapter, unless different latencies are explicitly stated. We assume the

standard five-stage integer pipeline, so that branches have a delay of one clock cycle. We as-

sume that the functional units are fully pipelined or replicated (as many times as the pipeline

depth), so that an operation of any type can be issued on every clock cycle and there are no

structural hazards.

Search WWH ::

Custom Search

Home