Hardware Reference
In-Depth Information
Getting back to our pipeline of Fig. 2-4, suppose that the cycle time of this ma-
chine is 2 nsec. Then it takes 10 nsec for an instruction to progress all the way
through the five-stage pipeline. At first glance, with an instruction taking 10 nsec,
it might appear that the machine can run at 100 MIPS, but in fact it does much bet-
ter than this. At every clock cycle (2 nsec), one new instruction is completed, so
the actual rate of processing is 500 MIPS, not 100 MIPS.
Pipelining allows a trade-off between latency (how long it takes to execute an
instruction), and processor bandwidth (how many MIPS the CPU has). With a
cycle time of T nsec, and n stages in the pipeline, the latency is nT nsec because
each instruction passes through n stages, each of which takes T nsec.
Since one instruction completes every clock cycle and there are 10 9 / T clock
cycles/second, the number of instructions executed per second is 10 9 / T . For ex-
ample, if T
2 nsec, 500 million instructions are executed each second. To get
the number of MIPS, we have to divide the instruction execution rate by 1 million
to get (10 9 / T )/10 6
=
1000/ T MIPS. Theoretically, we could measure instruction
execution rate in BIPS instead of MIPS, but nobody does that, so we will not ei-
ther.
=
Superscalar Architectures
If one pipeline is good, then surely two pipelines are better. One possible de-
sign for a dual pipeline CPU, based on Fig. 2-4, is shown in Fig. 2-5. Here a single
instruction fetch unit fetches pairs of instructions together and puts each one into
its own pipeline, complete with its own ALU for parallel operation. To be able to
run in parallel, the two instructions must not conflict over resource usage (e.g., reg-
isters), and neither must depend on the result of the other. As with a single
pipeline, either the compiler must guarantee this situation to hold (i.e., the hard-
ware does not check and gives incorrect results if the instructions are not compati-
ble), or conflicts must be detected and eliminated during execution using extra
hardware.
S1
S2
S3
S4
S5
Instruction
decode
unit
Operand
fetch
unit
Instruction
execution
unit
Write
back
unit
Instruction
fetch
unit
Instruction
decode
unit
Operand
fetch
unit
Instruction
execution
unit
Write
back
unit
Figure 2-5. Dual five-stage pipelines with a common instruction fetch unit.
Although pipelines, single or double, were originally used on RISC machines
(the 386 and its predecessors did not have any), starting with the 486 Intel began
 
Search WWH ::




Custom Search