Hardware Reference
In-Depth Information
Under these conditions, the speedup from pipelining equals the number of pipe stages, just
as an assembly line with n stages can ideally produce cars n times as fast. Usually, however,
the stages will not be perfectly balanced; furthermore, pipelining does involve some overhead.
Thus, the time per instruction on the pipelined processor will not have its minimum possible
value, yet it can be close.
Pipelining yields a reduction in the average execution time per instruction. Depending on
what you consider as the baseline, the reduction can be viewed as decreasing the number of
clock cycles per instruction (CPI), as decreasing the clock cycle time, or as a combination. If the
starting point is a processor that takes multiple clock cycles per instruction, then pipelining is
usually viewed as reducing the CPI. This is the primary view we will take. If the starting point
is a processor that takes 1 (long) clock cycle per instruction, then pipelining decreases the clock
cycle time.
Pipelining is an implementation technique that exploits parallelism among the instructions
in a sequential instruction stream. It has the substantial advantage that, unlike some speedup
techniques (see Chapter 4 ), it is not visible to the programmer. In this appendix we will irst
cover the concept of pipelining using a classic five-stage pipeline; other chapters investigate
the more sophisticated pipelining techniques in use in modern processors. Before we say more
about pipelining and its use in a processor, we need a simple instruction set, which we intro-
duce next.
The Basics Of A RISC Instruction Set
Throughout this topic we use a RISC (reduced instruction set computer) architecture or load-
store architecture to illustrate the basic concepts, although nearly all the ideas we introduce
in this topic are applicable to other processors. In this section we introduce the core of a typ-
ical RISC architecture. In this appendix, and throughout the topic, our default RISC architec-
ture is MIPS. In many places, the concepts are significantly similar that they will apply to any
RISC. RISC architectures are characterized by a few key properties, which dramatically sim-
plify their implementation:
■ All operations on data apply to data in registers and typically change the entire register (32
or 64 bits per register).
■ The only operations that affect memory are load and store operations that move data from
memory to a register or to memory from a register, respectively. Load and store operations
that load or store less than a full register (e.g., a byte, 16 bits, or 32 bits) are often available.
■ The instruction formats are few in number, with all instructions typically being one size.
These simple properties lead to dramatic simplifications in the implementation of pipelin-
ing, which is why these instruction sets were designed this way.
For consistency with the rest of the text, we use MIPS64, the 64-bit version of the MIPS in-
struction set. The extended 64-bit instructions are generally designated by having a D on the
start or end of the mnemonic. For example DADD is the 64-bit version of an add instruction, while
LD is the 64-bit version of a load instruction.
Like other RISC architectures, the MIPS instruction set provides 32 registers, although re-
gister 0 always has the value 0. Most RISC architectures, like MIPS, have three classes of in-
structions (see Appendix A for more detail):
1. ALU instructions —These instructions take either two registers or a register and a sign-ex-
tended immediate (called ALU immediate instructions , they have a 16-bit offset in MIPS),
operate on them, and store the result into a third register. Typical operations include add
( DADD ), subtract ( DSUB ), and logical operations (such as AND or OR ), which do not differenti-
Search WWH ::




Custom Search