Hardware Reference
In-Depth Information
Based on its own data structure, the scoreboard controls the instruction progression from
one step to the next by communicating with the functional units. There is a small complication,
however. There are only a limited number of source operand buses and result buses to the re-
gister file, which represents a structural hazard. The scoreboard must guarantee that the num-
ber of functional units allowed to proceed into steps 2 and 4 does not exceed the number of
buses available. We will not go into further detail on this, other than to mention that the CDC
6600 solved this problem by grouping the 16 functional units together into four groups and
supplying a set of buses, called data trunks , for each group. Only one unit in a group could
read its operands or write its result during a clock.
Now let's look at the detailed data structure maintained by a MIPS scoreboard with ive
functional units. Figure C.55 shows what the scoreboard's information looks like partway
through the execution of this simple sequence of instructions:
L.D F6,34(R2)
L.D F2,45(R3)
MUL.D F0,F2,F4
SUB.D F8,F6,F2
DIV.D F10,F0,F6
ADD.D F6,F8,F2
There are three parts to the scoreboard:
1. Instruction status —Indicates which of the four steps the instruction is in.
2. Functional unit status —Indicates the state of the functional unit (FU). There are nine ields
for each functional unit:
■ Busy—Indicates whether the unit is busy or not.
■ Op—Operation to perform in the unit (e.g., add or subtract).
■ Fi—Destination register.
■ Fj, Fk—Source-register numbers.
■ Qj, Qk—Functional units producing source registers Fj, Fk.
■ Rj, Rk—Flags indicating when Fj, Fk are ready and not yet read. Set to No after oper-
ands are read.
3. Register result status —Indicates which functional unit will write each register, if an active
instruction has the register as its destination. This field is set to blank whenever there are
no pending instructions that will write that register.
Now let's look at how the code sequence begun in Figure C.55 continues execution. After
that, we will be able to examine in detail the conditions that the scoreboard uses to control ex-
ecution.
Example
Assume the following EX cycle latencies (chosen to illustrate the behavior and
not representative) for the floating-point functional units: Add is 2 clock cycles,
multiply is 10 clock cycles, and divide is 40 clock cycles. Using the code segment
in Figure C.55 and beginning with the point indicated by the instruction status
in Figure C.55 , show what the status tables look like when MUL.D and DIV.D are
each ready to go to the write result state.
Answer
Search WWH ::




Custom Search