Pipelining Design Techniques - Fundamentals of Computer Organization and Architecture

Information Technology Reference

In-Depth Information

branch address is known. Consider, for example, the execution of the following

program loop on a pipeline consisting of two stages: Fetch (F) and Execute (E).

I 1 ! Again: Load 5, R 1 ;

R 1

I 2 !

Sub R 2 ;

R 2 R 2 2

I 3 !

Bnn Again;

Branch to Again if result is Not Negative;

I 4 !

Add R 4 , R 5 , R 3 ;

R 3 R 4 þ R 5 ;

It should be noted that at the end of the first loop, either instruction I 1 or instruction

I 4 will have to be fetched depending on the result of executing instruction I 3 .The

way with which such a situation has been dealt will delay fetching of the next

instruction until the result of executing instruction I 3 is known. This will lead to

incurring extra delay in the pipeline. However, this extra delay may be avoided if

the sequence of instructions has been reordered to become as follows.

Again:

Sub R 2 ;

R 2 R 2 2

Load 5, R 1 ;

R 1 5;

Bnn Again;

Branch to Again if result is Not Negative;

Add R 4 , R 5 , R 3 ;

R 3 R 4 þ R 5 ;

Figure 9.13 shows the Gantt's chart for executing the modified piece of code for the

case R 2 ¼

3 before entering the loop.

The figure indicates that branching takes place one instruction later than the

actual place where the branch instruction appears in the original instruction

sequence, hence the name “delayed branch.” It is also clear from Figure 9.13 that

by reordering the sequence of instructions, it was possible to fill the branch delay

time slot with a useful instruction, thus eliminating any extra delay in the pipeline.

It has been shown in a number of studies that “smart” compilers were able to make

use of one branch delay time slot in more than 80% of the cases. The use of branch

delay time slots has led to the improvement of both the speed-up and the throughput

of those processors using “smart” compilers.

PREDICTION OF THE NEXT INSTRUCTION TO FETCH This method tries to reduce the

time unit(s) that can potentially be lost due to instruction dependency by predicting

the next instruction to fetch after fetching a conditional branch instruction. The basis

is that if the branch outcomes are random, then it would be possible to save about

50% of the otherwise lost time. A simple way to carry out such a technique is to

I 2

I 1

I 3

I 2

I 1

I 3

I 2

I 1

I 3

I 2

I 1

I 3

I 4

I 2

I 1

I 3

I 2

I 1

I 3

I 2

I 1

I 3

I 2

I 1

I 3

I 4

Figure 9.13 Delayed branch

Fundamentals of Computer Organization and Architecture

Search WWH ::

Custom Search

Home