Hardware Reference
In-Depth Information
specifiers are at a fixed location in a RISC architecture. This technique is known as fixed
ield decoding . Note that we may read a register we don't use, which doesn't help but also
doesn't hurt performance. (It does waste energy to read an unneeded register, and power-
sensitive designs might avoid this.) Because the immediate portion of an instruction is also
located in an identical place, the sign-extended immediate is also calculated during this
cycle in case it is needed.
3. Execution/effective address cycle (EX):
The ALU operates on the operands prepared in the prior cycle, performing one of three
functions depending on the instruction type.
■ Memory reference—The ALU adds the base register and the offset to form the efective
address.
■ Register-Register ALU instruction—The ALU performs the operation specified by the
ALU opcode on the values read from the register file.
■ Register-Immediate ALU instruction—The ALU performs the operation specified by
the ALU opcode on the first value read from the register file and the sign-extended
immediate.
In a load-store architecture the effective address and execution cycles can be combined into a
single clock cycle, since no instruction needs to simultaneously calculate a data address and
perform an operation on the data.
4. Memory access (MEM):
If the instruction is a load, the memory does a read using the effective address computed in
the previous cycle. If it is a store, then the memory writes the data from the second register
read from the register file using the effective address.
5. Write-back cycle (WB):
■ Register-Register ALU instruction or load instruction:
Write the result into the register file, whether it comes from the memory system (for a load) or
from the ALU (for an ALU instruction).
In this implementation, branch instructions require 2 cycles, store instructions require 4
cycles, and all other instructions require 5 cycles. Assuming a branch frequency of 12% and a
store frequency of 10%, a typical instruction distribution leads to an overall CPI of 4.54. This
implementation, however, is not optimal either in achieving the best performance or in using
the minimal amount of hardware given the performance level; we leave the improvement of
this design as an exercise for you and instead focus on pipelining this version.
The Classic Five-Stage Pipeline For A RISC Processor
We can pipeline the execution described above with almost no changes by simply starting a
new instruction on each clock cycle. (See why we chose this design?) Each of the clock cycles
from the previous section becomes a pipe stage —a cycle in the pipeline. This results in the exe-
cution patern shown in Figure C.1 , which is the typical way a pipeline structure is drawn. Al-
though each instruction takes 5 clock cycles to complete, during each clock cycle the hardware
will initiate a new instruction and will be executing some part of the five different instructions.
Search WWH ::




Custom Search