THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

#1) stage is at the beginning of the pipeline. It is here that the address of the next

instruction to be fetched is used to index the instruction cache and start a branch

prediction. Normally, this address is the one following the previous instruction.

However, this sequential order can be broken for a variety of reasons, such as when

a previous instruction is a branch that has been predicted to be taken, or a trap or

interrupt needs to be serviced. Because instruction fetch and branch prediction

takes more than one cycle, the Fe 2 (Fetch #2) stage provides extra time to carry

out these operations. In the Fe 3 (Fetch #3) stage the instructions fetched (up to

four) are pushed into the instruction queue.

The De 1 and De 2 (Decode) stages decode the instructions. This step deter-

mines what inputs instructions will need (registers and memory) and what re-

sources they will require to execute (functional units). Once decode is completed,

the instructions enter the Re (Rename) stage where the registers accessed are

renamed to eliminate WAR and WAW hazards during out-of-order execution. This

stage contains the rename table which records which physical register currently

holds all architectural registers. Using this table, any input register can be easily

renamed. The output register must be given a new physical register, which is taken

from a pool of unused physical registers. The assigned physical register will be in

use by the instruction until it retires.

Next, instructions enter the Iss (Instruction Issue) stage, where they are

dropped into the instruction issue queue. The issue queue watches for instructions

whose inputs are all ready. When ready, their register inputs are acquired (from the

physical register file or the bypass bus), and then the instruction is sent to the ex-

ecution stages. Like the Core i7, the Cortex A9 potentially issues instructions out

of program order. Up to four instructions can be issued each cycle. The choice of

instructions is constrained by the functional units available.

The Ex (Execute) stages are where instructions are actually executed. Most

arithmetic, Boolean, and shift instructions use the integer ALUs and complete in

one cycle. Loads and stores take two cycles (if they hit in the L1 cache), and multi-

plies take three cycles. The Ex stages contain multiple functional units, which are:

1. Integer ALU 1.

2. Integer ALU 2.

3. Multiply unit.

4. Floating-point and SIMD vector ALU (optional with VFP and NEON

support).

5. Load and store unit.

Conditional branch instructions are also processed in the first Ex stage and their di-

rection (branch/no branch) is determined. In the event of a misprediction, a signal

is sent back to the Fe1 stage and the pipeline voided.

Search WWH ::

Custom Search

Home