THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

In an attempt to get around these problems and produce better performance,

some CPUs allow dependent instructions to be skipped over, to get to future in-

structions that are not dependent. Needless to say, the internal instruction-schedul-

ing algorithm used must deliver the same effect as if the program were executed in

the order written. We will now demonstrate how instruction reordering works

using a detailed example.

To illustrate the nature of the problem, we will start with a machine that always

issues instructions in program order and also requires them to complete execution

in program order. The significance of the latter will become clear later.

Our example machine has eight registers visible to the programmer, R0

through R7 . All arithmetic instructions use three registers: two for the operands

and one for the result, the same as the Mic-4. We will assume that if an instruction

is decoded in cycle n , execution starts in cycle n

1. For a simple instruction,

such as an addition or subtraction, the writeback to the destination register occurs

at the end of cycle n

+

2. For a more complicated instruction, such as a multiplica-

tion, the writeback occurs at the end of cycle n

+

3. To make the example realistic,

we will allow the decode unit to issue up to two instructions per clock cycle. Com-

mercial superscalar CPUs often can issue four or even six instructions per clock

cycle.

Our example execution sequence is shown in Fig. 4-43. Here the first column

gives the number of the cycle and the second one gives the instruction number.

The third column lists the instruction decoded. The fourth one tells which instruc-

tion is being issued (with a maximum of two per clock cycle). The fifth one tells

which instruction has been retired (completed). Remember that in this example we

are requiring both in-order issue and in-order completion, so instruction k

+

1 can-

not be issued until instruction k has been issued, and instruction k

1 cannot be

retired (meaning the writeback to the destination register is performed) until in-

struction k has been retired. The other 16 columns are discussed below.

After decoding an instruction, the decode unit has to decide whether or not it

can be issued immediately. To make this decision, the decode unit needs to know

the status of all the registers. If, for example, the current instruction needs a regis-

ter whose value has not yet been computed, the current instruction cannot be issued

and the CPU must stall.

We will keep track of register use with a device called a scoreboard , which

was first present in the CDC 6600. The scoreboard has a small counter for each

register telling how many times that register is in use as a source by currently ex-

ecuting instructions. If a maximum of, say, 15 instructions may be executing at

once, then a 4-bit counter will do. When an instruction is issued, the scoreboard

entries for its operand registers are incremented. When an instruction is retired,

the entries are decremented.

The scoreboard also has counters to keep track of registers being used as desti-

nations. Since only one write at a time is allowed, these counters can be 1-bit

wide. The rightmost 16 columns in Fig. 4-43 show the scoreboard.

+

Structured Computer Organization

Search WWH ::

Custom Search

Home