THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

Swap1 Swap2 Swap3 Swap4 Swap5 Swap6

Cy MAR=SP − 1;rd MAR=SP H=MDR;wr MDR=TOS MAR=SP − 1;wr TOS=H;goto (MBR1)

1 B=SP

2 C=B − 1 B=SP

3 MAR=C; rd C=B

4 MDR=Mem MAR=C

5

B=MDR

6

C=B

B=TOS

7

H=C; wr

C=B

B=SP

8

Mem=MDR MDR=C

C=B − 1

B=H

9

MAR=C; wr

C=B

10

Mem=MDR

TOS=C

11

goto (MBR1)

Figure 4-33. The implementation of SWAP on the Mic-3.

Stopping to wait for a needed value is called stalling . After that, we can continue

starting microinstructions every cycle as there are no more dependences, although

swap6 just barely makes it, since it reads H in the cycle after swap3 writes it. If

swap5 had tried to read H , it would have stalled for one cycle.

Although the Mic-3 program takes more cycles than the Mic-2 program, it still

runs faster. If we call the Mic-3 cycle time

Δ

T nsec, then the Mic-3 requires 11

Δ

T

nsec to execute SWAP . In contrast, the Mic-2 takes 6 cycles at 3

Δ

T each, for a total

T . Pipelining has made the machine faster, even though we had to stall once

to avoid a dependence.

Pipelining is a key technique in all modern CPUs, so it is important to under-

stand it well. In Fig. 4-34 we see the data path of Fig. 4-31 graphically illustrated

as a pipeline. The first column represents what is going on during cycle 1, the sec-

ond column represents cycle 2, and so on (assuming no stalls). The shaded region

in cycle 1 for instruction 1 indicates that the IFU is busy fetching instruction 1.

One clock tick later, during cycle 2, the registers required by instruction 1 are

being loaded into the A and B latches while at the same time the IFU is busy fetch-

ing instruction 2, again shown by the two shaded rectangles in cycle 2.

During cycle 3, instruction 1 is using the ALU and shifter to do its operation,

the A and B latches are being loaded for instruction 2, and instruction 3 is being

fetched. Finally, during cycle 4, four instructions are being worked on at the same

time. The results from instruction 1 are being stored, the ALU work for instruction

2 is being performed, the A and B latches for instruction 3 are being loaded, and

instruction 4 is being fetched.

If we had shown cycle 5 and subsequent cycles, the pattern would have been

the same as in cycle 4: all four parts of the data path that can run independently

of 18

Δ

Structured Computer Organization

Search WWH ::

Custom Search

Home