Pipelining Design Techniques - Fundamentals of Computer Organization and Architecture

Information Technology Reference

In-Depth Information

In the case that a record cannot be found for a branch, then a static branch pre-

diction procedure is used. The static branch prediction procedure investigates the

branch to find out whether it is going backwards or forwards. A branch going back-

wards is assumed to be part of a loop and the branch is assumed to be taken. A

branch going forwards is not taken. The ARM 11 employs an eight-stage pipeline.

Every correctly predicted branch is found to lead to a typical saving of five processor

clock cycles. Around 80% of branches are found to be correctly predicted using the

dynamic

static combination in the ARM 11 architecture. The pipeline features of

the ARM 11 are introduced in the next subsection.

A branch prediction technique based on the use of a 16K-entry branch history

record is employed in the UltraSPARC III RISC processor, a 14-stage pipeline. How-

ever, the impact of a misprediction, in terms of the number of cycles lost due to a

branch misprediction is reduced by using the following approach. On predictions

that a branch will be taken and while the branch target instructions are being fetched,

the “fall-through” instructions are prepared for issue in parallel through the use of a

four-entry branch miss queue (BMQ). This reduces the misprediction penalty to

two cycles. The UltraSPARC III has achieved 95% success in branch prediction.

The pipeline features of the UltraSPARC III are introduced in the next subsection.

/

Methods Used to Reduce Pipeline Stall Due to Data Dependency

Hardware Operand Forwarding

Hardware operand forwarding allows the

result of one ALU operation to be available to another ALU operation in the

cycle that immediately follows. Consider the following two instructions.

ADD R 1 , R 2 , R 3 ;

R 3 R 1 þ R 2

SUB

R 3 ,1,R 4 ;

R 4 R 3 2

1

It is easy to notice that there exists a read-after-write data dependency between these

two instructions. Correct execution of this sequence on a five-stage pipeline (IF, ID,

OF, IE, IS) will cause a stall of the second instruction after decoding it and until

the result of the first instruction is stored in R 3 . Only at that time, the operand of

the second instruction, that is, the new value stored in R 3 , can be fetched by the

second instruction. However, if it is possible to have the result of the first instruction

forwarded to the ALU during the same time unit as it is being stored in R 3 , then it

will be possible to reduce the stall time. This is illustrated in Figure 9.14.

The assumption that the operand of the second instruction be forwarded immedi-

ately after it is available and while it is being stored in R 3 requires a modification in

the data path such that an added feedback path is created to allow for such operand

forwarding. This modification is shown using dotted lines in Figure 9.15. It should

be noted that the needed modification to achieve hardware operand forwarding is

expensive and requires careful issuing of control signals. It should also be noted

that if it is possible to perform both instruction decoding and operand fetching

during the same time unit, then there will be no lost time units.

Fundamentals of Computer Organization and Architecture

Search WWH ::

Custom Search

Home