Information Technology Reference
In-Depth Information
Instruction Buffer (Queues)
The UltraSPARC III instruction issue unit (IIU)
incorporates two instruction buffering queues: the branch instruction queue (BIQ)
and the branch miss queue (BMQ). These are introduced below.
BRANCH INSTRUCTION QUEUE (BIQ) This is a 20-entry queue that allows the fetch
and the execution unit to operate independently. The fetch unit predicts the
execution path and continuously fills the BIQ. When a taken branch is encountered,
two fetch cycles are lost to fill the BIQ.
BRANCH MISS QUEUE (BMQ) During the lost two cycles, the sequential instructions
that have been already accessed are buffered into a four-entry BMQ. If it is then
found that the branch has been mispredicted, the instructions from the BMQ are
directed to the execution unit directly.
9.4. INSTRUCTION-LEVEL PARALLELISM
Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the
idea of multiple issue processors (MIP). An MIP has multiple pipelined datapaths
for instruction execution. Each of these pipelines can issue and execute one instruc-
tion per cycle. Figure 9.17 shows the case of a processor having three pipes. For
comparison purposes, we also show in the same figure the sequential and the
single pipeline case. It is clear from the figure that while the limit on the number
of cycles per instruction in the case of a single pipeline is CPI
ΒΌ
1, the MIP can
achieve CPI
1.
In order to make full use of ILP, an analysis should be made to identify the
instruction and data dependencies that exist in a given program. This analysis
should lead to the appropriate scheduling of the group of instructions that can be
issued simultaneously while retaining the program correctness. Static scheduling
results in the use of very long instruction word (VLIW) architectures, while dynamic
scheduling results in the use of superscalar architectures.
In VLIW, an instruction represents a bundle of many operations to be issued sim-
ultaneously. The compiler is responsible for checking all dependencies and making
the appropriate groupings
,
scheduling of operations. This is in contrast with super-
scalar architectures, which rely entirely on the hardware for scheduling of
instructions.
/
Superscalar Architectures
A scalar machine is able to perform only one arith-
metic operation at once. A superscalar architecture (SPA) is able to fetch, decode,
execute, and store results of several instructions at the same time. It does so by trans-
forming a static and sequential instruction stream into a dynamic and parallel one, in
order to execute a number of instructions simultaneously. Upon completion, the
SPA reinforces the original sequential instruction stream such that instructions
can be completed in the original order.
In an SPA instruction, processing consists of the fetch, decode, issue, and commit
stages. During the fetch stage, multiple instructions are fetched simultaneously.
Search WWH ::




Custom Search