Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

specifiers are at a fixed location in a RISC architecture. This technique is known as fixed

ield decoding . Note that we may read a register we don't use, which doesn't help but also

doesn't hurt performance. (It does waste energy to read an unneeded register, and power-

sensitive designs might avoid this.) Because the immediate portion of an instruction is also

located in an identical place, the sign-extended immediate is also calculated during this

cycle in case it is needed.

3. Execution/effective address cycle (EX):

The ALU operates on the operands prepared in the prior cycle, performing one of three

functions depending on the instruction type.

■ Memory reference—The ALU adds the base register and the offset to form the efective

address.

■ Register-Register ALU instruction—The ALU performs the operation specified by the

ALU opcode on the values read from the register file.

■ Register-Immediate ALU instruction—The ALU performs the operation specified by

the ALU opcode on the first value read from the register file and the sign-extended

immediate.

In a load-store architecture the effective address and execution cycles can be combined into a

single clock cycle, since no instruction needs to simultaneously calculate a data address and

perform an operation on the data.

4. Memory access (MEM):

If the instruction is a load, the memory does a read using the effective address computed in

the previous cycle. If it is a store, then the memory writes the data from the second register

read from the register file using the effective address.

5. Write-back cycle (WB):

■ Register-Register ALU instruction or load instruction:

Write the result into the register file, whether it comes from the memory system (for a load) or

from the ALU (for an ALU instruction).

In this implementation, branch instructions require 2 cycles, store instructions require 4

cycles, and all other instructions require 5 cycles. Assuming a branch frequency of 12% and a

store frequency of 10%, a typical instruction distribution leads to an overall CPI of 4.54. This

implementation, however, is not optimal either in achieving the best performance or in using

the minimal amount of hardware given the performance level; we leave the improvement of

this design as an exercise for you and instead focus on pipelining this version.

The Classic Five-Stage Pipeline For A RISC Processor

We can pipeline the execution described above with almost no changes by simply starting a

new instruction on each clock cycle. (See why we chose this design?) Each of the clock cycles

from the previous section becomes a pipe stage —a cycle in the pipeline. This results in the exe-

cution patern shown in Figure C.1 , which is the typical way a pipeline structure is drawn. Al-

though each instruction takes 5 clock cycles to complete, during each clock cycle the hardware

will initiate a new instruction and will be executing some part of the five different instructions.

Search WWH ::

Custom Search

Home