Hardware Reference
In-Depth Information
ate between 32-bit and 64-bit versions. Immediate versions of these instructions use the
same mnemonics with a suffix of I . In MIPS, there are both signed and unsigned forms
of the arithmetic instructions; the unsigned forms, which do not generate overflow excep-
tions—and thus are the same in 32-bit and 64-bit mode—have a U at the end (e.g., DADDU ,
DSUBU , DADDIU ).
2. Load and store instructions —These instructions take a register source, called the base register ,
and an immediate field (16-bit in MIPS), called the offset , as operands. The sum—called the
efective address —of the contents of the base register and the sign-extended offset is used
as a memory address. In the case of a load instruction, a second register operand acts as
the destination for the data loaded from memory. In the case of a store, the second register
operand is the source of the data that is stored into memory. The instructions load word
( LD ) and store word ( SD ) load or store the entire 64-bit register contents.
3. Branches and jumps —Branches are conditional transfers of control. There are usually two
ways of specifying the branch condition in RISC architectures: with a set of condition bits
(sometimes called a condition code ) or by a limited set of comparisons between a pair of re-
gisters or between a register and zero. MIPS uses the later. For this appendix, we consider
only comparisons for equality between two registers. In all RISC architectures, the branch
destination is obtained by adding a sign-extended offset (16 bits in MIPS) to the current
PC. Unconditional jumps are provided in many RISC architectures, but we will not cover
jumps in this appendix.
A Simple Implementation Of A RISC Instruction Set
To understand how a RISC instruction set can be implemented in a pipelined fashion, we
need to understand how it is implemented without pipelining. This section shows a simple im-
plementation where every instruction takes at most 5 clock cycles. We will extend this basic
implementation to a pipelined version, resulting in a much lower CPI. Our unpipelined im-
plementation is not the most economical or the highest-performance implementation without
pipelining. Instead, it is designed to lead naturally to a pipelined implementation. Implement-
ing the instruction set requires the introduction of several temporary registers that are not part
of the architecture; these are introduced in this section to simplify pipelining. Our implement-
ation will focus only on a pipeline for an integer subset of a RISC architecture that consists of
load-store word, branch, and integer ALU operations.
Every instruction in this RISC subset can be implemented in at most 5 clock cycles. The 5
clock cycles are as follows.
1. Instruction fetch cycle (IF):
Send the program counter (PC) to memory and fetch the current instruction from memory.
Update the PC to the next sequential PC by adding 4 (since each instruction is 4 bytes) to
the PC.
2. Instruction decode/register fetch cycle (ID):
Decode the instruction and read the registers corresponding to register source speciiers
from the register file. Do the equality test on the registers as they are read, for a possible
branch. Sign-extend the offset field of the instruction in case it is needed. Compute the pos-
sible branch target address by adding the sign-extended offset to the incremented PC. In
an aggressive implementation, which we explore later, the branch can be completed at the
end of this stage by storing the branch-target address into the PC, if the condition test yiel-
ded true.
Decoding is done in parallel with reading registers, which is possible because the register
Search WWH ::




Custom Search