Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Under these conditions, the speedup from pipelining equals the number of pipe stages, just

as an assembly line with n stages can ideally produce cars n times as fast. Usually, however,

the stages will not be perfectly balanced; furthermore, pipelining does involve some overhead.

Thus, the time per instruction on the pipelined processor will not have its minimum possible

value, yet it can be close.

Pipelining yields a reduction in the average execution time per instruction. Depending on

what you consider as the baseline, the reduction can be viewed as decreasing the number of

clock cycles per instruction (CPI), as decreasing the clock cycle time, or as a combination. If the

starting point is a processor that takes multiple clock cycles per instruction, then pipelining is

usually viewed as reducing the CPI. This is the primary view we will take. If the starting point

is a processor that takes 1 (long) clock cycle per instruction, then pipelining decreases the clock

cycle time.

Pipelining is an implementation technique that exploits parallelism among the instructions

in a sequential instruction stream. It has the substantial advantage that, unlike some speedup

techniques (see Chapter 4 ), it is not visible to the programmer. In this appendix we will irst

cover the concept of pipelining using a classic five-stage pipeline; other chapters investigate

the more sophisticated pipelining techniques in use in modern processors. Before we say more

about pipelining and its use in a processor, we need a simple instruction set, which we intro-

duce next.

The Basics Of A RISC Instruction Set

Throughout this topic we use a RISC (reduced instruction set computer) architecture or load-

store architecture to illustrate the basic concepts, although nearly all the ideas we introduce

in this topic are applicable to other processors. In this section we introduce the core of a typ-

ical RISC architecture. In this appendix, and throughout the topic, our default RISC architec-

ture is MIPS. In many places, the concepts are significantly similar that they will apply to any

RISC. RISC architectures are characterized by a few key properties, which dramatically sim-

plify their implementation:

■ All operations on data apply to data in registers and typically change the entire register (32

or 64 bits per register).

■ The only operations that affect memory are load and store operations that move data from

memory to a register or to memory from a register, respectively. Load and store operations

that load or store less than a full register (e.g., a byte, 16 bits, or 32 bits) are often available.

■ The instruction formats are few in number, with all instructions typically being one size.

These simple properties lead to dramatic simplifications in the implementation of pipelin-

ing, which is why these instruction sets were designed this way.

For consistency with the rest of the text, we use MIPS64, the 64-bit version of the MIPS in-

struction set. The extended 64-bit instructions are generally designated by having a D on the

start or end of the mnemonic. For example DADD is the 64-bit version of an add instruction, while

LD is the 64-bit version of a load instruction.

Like other RISC architectures, the MIPS instruction set provides 32 registers, although re-

gister 0 always has the value 0. Most RISC architectures, like MIPS, have three classes of in-

structions (see Appendix A for more detail):

1. ALU instructions —These instructions take either two registers or a register and a sign-ex-

tended immediate (called ALU immediate instructions , they have a 16-bit offset in MIPS),

operate on them, and store the result into a third register. Typical operations include add

( DADD ), subtract ( DSUB ), and logical operations (such as AND or OR ), which do not differenti-

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home