Information Technology Reference
In-Depth Information
In the case that a record cannot be found for a branch, then a static branch pre-
diction procedure is used. The static branch prediction procedure investigates the
branch to find out whether it is going backwards or forwards. A branch going back-
wards is assumed to be part of a loop and the branch is assumed to be taken. A
branch going forwards is not taken. The ARM 11 employs an eight-stage pipeline.
Every correctly predicted branch is found to lead to a typical saving of five processor
clock cycles. Around 80% of branches are found to be correctly predicted using the
dynamic
static combination in the ARM 11 architecture. The pipeline features of
the ARM 11 are introduced in the next subsection.
A branch prediction technique based on the use of a 16K-entry branch history
record is employed in the UltraSPARC III RISC processor, a 14-stage pipeline. How-
ever, the impact of a misprediction, in terms of the number of cycles lost due to a
branch misprediction is reduced by using the following approach. On predictions
that a branch will be taken and while the branch target instructions are being fetched,
the “fall-through” instructions are prepared for issue in parallel through the use of a
four-entry branch miss queue (BMQ). This reduces the misprediction penalty to
two cycles. The UltraSPARC III has achieved 95% success in branch prediction.
The pipeline features of the UltraSPARC III are introduced in the next subsection.
/
Methods Used to Reduce Pipeline Stall Due to Data Dependency
Hardware Operand Forwarding
Hardware operand forwarding allows the
result of one ALU operation to be available to another ALU operation in the
cycle that immediately follows. Consider the following two instructions.
ADD R 1 , R 2 , R 3 ;
R 3 R 1 þ R 2
SUB
R 3 ,1,R 4 ;
R 4 R 3 2
1
It is easy to notice that there exists a read-after-write data dependency between these
two instructions. Correct execution of this sequence on a five-stage pipeline (IF, ID,
OF, IE, IS) will cause a stall of the second instruction after decoding it and until
the result of the first instruction is stored in R 3 . Only at that time, the operand of
the second instruction, that is, the new value stored in R 3 , can be fetched by the
second instruction. However, if it is possible to have the result of the first instruction
forwarded to the ALU during the same time unit as it is being stored in R 3 , then it
will be possible to reduce the stall time. This is illustrated in Figure 9.14.
The assumption that the operand of the second instruction be forwarded immedi-
ately after it is available and while it is being stored in R 3 requires a modification in
the data path such that an added feedback path is created to allow for such operand
forwarding. This modification is shown using dotted lines in Figure 9.15. It should
be noted that the needed modification to achieve hardware operand forwarding is
expensive and requires careful issuing of control signals. It should also be noted
that if it is possible to perform both instruction decoding and operand fetching
during the same time unit, then there will be no lost time units.
Search WWH ::




Custom Search