Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Pitfall Extensive Pipelining Can Impact Other Aspects Of A Design, Leading To

Overall Worse Cost-performance.

The best example of this phenomenon comes from two implementations of the VAX, the 8600

and the 8700. When the 8600 was initially delivered, it had a cycle time of 80 ns. Subsequently,

a redesigned version, called the 8650, with a 55 ns clock was introduced. The 8700 has a much

simpler pipeline that operates at the microinstruction level, yielding a smaller CPU with a

faster clock cycle of 45 ns. The overall outcome is that the 8650 has a CPI advantage of about

20%, but the 8700 has a clock rate that is about 20% faster. Thus, the 8700 achieves the same

performance with much less hardware.

Pitfall Evaluating Dynamic Or Static Scheduling On The Basis Of Unoptimized

Code.

Unoptimized code—containing redundant loads, stores, and other operations that might be

eliminated by an optimizer—is much easier to schedule than “tight” optimized code. This

holds for scheduling both control delays (with delayed branches) and delays arising from

RAW hazards. In gcc running on an R3000, which has a pipeline almost identical to that of Sec-

tion C.1 , the frequency of idle clock cycles increases by 18% from the unoptimized and sched-

uled code to the optimized and scheduled code. Of course, the optimized program is much

faster, since it has fewer instructions. To fairly evaluate a compile-time scheduler or runtime

dynamic scheduling, you must use optimized code, since in the real system you will derive

good performance from other optimizations in addition to scheduling.

C.9 Concluding Remarks

At the beginning of the 1980s, pipelining was a technique reserved primarily for supercom-

puters and large multimillion dollar mainframes. By the mid-1980s, the first pipelined micro-

processors appeared and helped transform the world of computing, allowing microprocessors

to bypass minicomputers in performance and eventually to take on and outperform main-

frames. By the early 1990s, high-end embedded microprocessors embraced pipelining, and

desktops were headed toward the use of the sophisticated dynamically scheduled, multiple-

issue approaches discussed in Chapter 3 . The material in this appendix, which was considered

reasonably advanced for graduate students when this text first appeared in 1990, is now con-

sidered basic undergraduate material and can be found in processors costing less than $2!

C.10 Historical Perspective and References

Section L.5 (available online) features a discussion on the development of pipelining and

instruction-level parallelism covering both this appendix and the material in Chapter 3 . We

provide numerous references for further reading and exploration of these topics.

Updated Exercises by Diana Franklin

C.1 [15/15/15/15/25/10/15] <A.2> Use the following code fragment:

Loop:

LD

R1,0(R2)

;load R1 from address 0+R2

Search WWH ::

Custom Search

Home