Hardware Reference
In-Depth Information
BEQZ R12,skip
DSUBU R4,R5,R6
DADDU R5,R4,R9
skip: OR R7,R8,R9
Suppose we knew that the register destination of the DSUBU instruction ( R4 ) was unused after
the instruction labeled skip . (The property of whether a value will be used by an upcoming
instruction is called liveness .) If R4 were unused, then changing the value of R4 just before the
branch would not affect the data flow since R4 would be dead (rather than live) in the code re-
gion after skip . Thus, if R4 were dead and the existing DSUBU instruction could not generate an
exception (other than those from which the processor resumes the same process), we could
move the DSUBU instruction before the branch, since the data flow cannot be affected by this
change.
If the branch is taken, the DSUBU instruction will execute and will be useless, but it will not af-
fect the program results. This type of code scheduling is also a form of speculation, often called
software speculation, since the compiler is beting on the branch outcome; in this case, the bet
is that the branch is usually not taken. More ambitious compiler speculation mechanisms are
discussed in Appendix H. Normally, it will be clear when we say speculation or speculative
whether the mechanism is a hardware or software mechanism; when it is not clear, it is best to
say “hardware speculation” or “software speculation.”
Control dependence is preserved by implementing control hazard detection that causes con-
trol stalls. Control stalls can be eliminated or reduced by a variety of hardware and software
techniques, which we examine in Section 3.3 .
3.2 Basic Compiler Techniques for Exposing ILP
This section examines the use of simple compiler technology to enhance a processor's ability
to exploit ILP. These techniques are crucial for processors that use static issue or static schedul-
ing. Armed with this compiler technology, we will shortly examine the design and per-
formance of processors using static issuing. Appendix H will investigate more sophisticated
compiler and associated hardware schemes designed to enable a processor to exploit more
instruction-level parallelism.
Basic Pipeline Scheduling And Loop Unrolling
To keep a pipeline full, parallelism among instructions must be exploited by finding sequences
of unrelated instructions that can be overlapped in the pipeline. To avoid a pipeline stall, the
execution of a dependent instruction must be separated from the source instruction by a dis-
tance in clock cycles equal to the pipeline latency of that source instruction. A compiler's abil-
ity to perform this scheduling depends both on the amount of ILP available in the program
and on the latencies of the functional units in the pipeline. Figure 3.2 shows the FP unit laten-
cies we assume in this chapter, unless different latencies are explicitly stated. We assume the
standard five-stage integer pipeline, so that branches have a delay of one clock cycle. We as-
sume that the functional units are fully pipelined or replicated (as many times as the pipeline
depth), so that an operation of any type can be issued on every clock cycle and there are no
structural hazards.
 
Search WWH ::




Custom Search