Hardware Reference
In-Depth Information
#1) stage is at the beginning of the pipeline. It is here that the address of the next
instruction to be fetched is used to index the instruction cache and start a branch
prediction. Normally, this address is the one following the previous instruction.
However, this sequential order can be broken for a variety of reasons, such as when
a previous instruction is a branch that has been predicted to be taken, or a trap or
interrupt needs to be serviced. Because instruction fetch and branch prediction
takes more than one cycle, the Fe 2 (Fetch #2) stage provides extra time to carry
out these operations. In the Fe 3 (Fetch #3) stage the instructions fetched (up to
four) are pushed into the instruction queue.
The De 1 and De 2 (Decode) stages decode the instructions. This step deter-
mines what inputs instructions will need (registers and memory) and what re-
sources they will require to execute (functional units). Once decode is completed,
the instructions enter the Re (Rename) stage where the registers accessed are
renamed to eliminate WAR and WAW hazards during out-of-order execution. This
stage contains the rename table which records which physical register currently
holds all architectural registers. Using this table, any input register can be easily
renamed. The output register must be given a new physical register, which is taken
from a pool of unused physical registers. The assigned physical register will be in
use by the instruction until it retires.
Next, instructions enter the Iss (Instruction Issue) stage, where they are
dropped into the instruction issue queue. The issue queue watches for instructions
whose inputs are all ready. When ready, their register inputs are acquired (from the
physical register file or the bypass bus), and then the instruction is sent to the ex-
ecution stages. Like the Core i7, the Cortex A9 potentially issues instructions out
of program order. Up to four instructions can be issued each cycle. The choice of
instructions is constrained by the functional units available.
The Ex (Execute) stages are where instructions are actually executed. Most
arithmetic, Boolean, and shift instructions use the integer ALUs and complete in
one cycle. Loads and stores take two cycles (if they hit in the L1 cache), and multi-
plies take three cycles. The Ex stages contain multiple functional units, which are:
1. Integer ALU 1.
2. Integer ALU 2.
3. Multiply unit.
4. Floating-point and SIMD vector ALU (optional with VFP and NEON
support).
5. Load and store unit.
Conditional branch instructions are also processed in the first Ex stage and their di-
rection (branch/no branch) is determined. In the event of a misprediction, a signal
is sent back to the Fe1 stage and the pipeline voided.
Search WWH ::




Custom Search