Hardware Reference
In-Depth Information
FIGURE C.58 Required checks and bookkeeping actions for each step in
instruction execution . FU stands for the functional unit used by the instruc-
tion, D is the destination register name, S1 and S2 are the source register
names, and op is the operation to be done. To access the scoreboard entry
named Fj for functional unit FU we use the notation Fj[FU]. Result[D] is the
name of the functional unit that will write register D. The test on the write result
case prevents the write when there is a WAR hazard, which exists if another in-
struction has this instruction's destination (Fi[FU]) as a source (Fj[f] f ] or Fk[ f ])
and if some other instruction has written the register (Rj = Yes or Rk = Yes).
The variable f is used for any functional unit.
The costs and benefits of scoreboarding are interesting considerations. The
CDC 6600 designers measured a performance improvement of 1.7 for
FORTRAN programs and 2.5 for hand-coded assembly language. However, this
was measured in the days before software pipeline scheduling, semiconductor
main memory, and caches (which lower memory access time). The scoreboard
on the CDC 6600 had about as much logic as one of the functional units, which
is surprisingly low. The main cost was in the large number of buses—about four
times as many as would be required if the CPU only executed instructions in
order (or if it only initiated one instruction per execute cycle). The recently in-
creasing interest in dynamic scheduling is motivated by attempts to issue more
instructions per clock (so the cost of more buses must be paid anyway) and by
ideas like speculation (explored in Section 4.7 ) that naturally build on dynamic
scheduling.
A scoreboard uses the available ILP to minimize the number of stalls arising
from the program's true data dependences. In eliminating stalls, a scoreboard is
limited by several factors:
1. The amount of parallelism available among the instructions —This determines
whether independent instructions can be found to execute. If each in-
struction depends on its predecessor, no dynamic scheduling scheme can
reduce stalls. If the instructions in the pipeline simultaneously must be
chosen from the same basic block (as was true in the 6600), this limit is
likely to be quite severe.
 
Search WWH ::




Custom Search