Hardware Reference
In-Depth Information
tions to execute out of order when there are sufficient resources and no data dependences; it
is named after the CDC 6600 scoreboard, which developed this capability.
Before we see how scoreboarding could be used in the MIPS pipeline, it is important to ob-
serve that WAR hazards, which did not exist in the MIPS floating-point or integer pipelines,
may arise when instructions execute out of order. For example, consider the following code
sequence:
DIV.D F0,F2,F4
ADD.D F10,F0,F8
SUB.D F8,F8,F14
There is an antidependence between the ADD.D and the SUB.D : If the pipeline executes the SUB.D
before the ADD.D , it will violate the antidependence, yielding incorrect execution. Likewise, to
avoid violating output dependences, WAW hazards (e.g., as would occur if the destination of
the SUB.D were F10 ) must also be detected. As we will see, both these hazards are avoided in a
scoreboard by stalling the later instruction involved in the antidependence.
The goal of a scoreboard is to maintain an execution rate of one instruction per clock cycle
(when there are no structural hazards) by executing an instruction as early as possible. Thus,
when the next instruction to execute is stalled, other instructions can be issued and executedif
if they do not depend on any active or stalled instruction. The scoreboard takes full respons-
ibility for instruction issue and execution, including all hazard detection. Taking advantage
of out-of-order execution requires multiple instructions to be in their EX stage simultan-
eously. This can be achieved with multiple functional units, with pipelined functional units,
or with both. Since these two capabilities—pipelined functional units and multiple functional
units—are essentially equivalent for the purposes of pipeline control, we will assume the pro-
cessor has multiple functional units.
The CDC 6600 had 16 separate functional units, including 4 floating-point units, 5 units for
memory references, and 7 units for integer operations. On a processor for the MIPS archi-
tecture, scoreboards make sense primarily on the floating-point unit since the latency of the
other functional units is very small. Let's assume that there are two multipliers, one adder,
one divide unit, and a single integer unit for all memory references, branches, and integer op-
erations. Although this example is simpler than the CDC 6600, it is sufficiently powerful to
demonstrate the principles without having a mass of detail or needing very long examples.
Because both MIPS and the CDC 6600 are load-store architectures, the techniques are nearly
identical for the two processors. Figure C.54 shows what the processor looks like.
Search WWH ::




Custom Search