Pipelining Design Techniques - Fundamentals of Computer Organization and Architecture

Information Technology Reference

In-Depth Information

Carry-save addition can be used to realize a pipelined multiplication building

block. Consider, for example, the multiplication of two n-bit operands A and B.

The multiplication operation can be transformed into an addition as shown in

Figure 9.21. The figure illustrates the case of multiplying two 8-bit operands A

and B. A carry-save based multiplication scheme using the principle shown in

Figure 9.21 is shown in Figure 9.22. The scheme is based on the idea of producing

the set of partial products needed and then adding them up using a carry-save

addition scheme.

9.6. SUMMARY

In this chapter, we have considered the basic principles involved in designing pipe-

line architectures. Our coverage started with a discussion on a number of metrics

that can be used to assess the goodness of a pipeline. We then moved to present a

general discussion on the main problems that need to be considered in designing

a pipelined architecture. In particular we considered two main problems: instruction

and data dependency. The effect of these two problems on the performance of

a pipeline has been elaborated. Some possible techniques that can be used to

reduce the effect of the instruction and data dependency have been introduced

and illustrated. Two examples of recent pipeline architectures, the ARM 11 micro-

architecture, and the UltraSPARC III Processor, have been presented. Our discus-

sion in the chapter ended up with an introduction of some of the ideas that can be

used in realizing pipeline arithmetic architectures.

EXERCISES

1. Consider the execution of 500 instructions on a five-stage pipeline machine.

Compute the speed-up due to the use of pipelining given that the probability

of an instruction being a branch is p ¼

0.3? What must be the value of p and

the expected number of branch instructions such that a speed-up of at least 4 is

possible? What must be the value of p such that a speed-up of at least 5 is poss-

ible? Assume that each stage takes one cycle to perform its task.

2. Assume that a RISC machine executes one instruction per clock cycle if no

branches are executed. Delayed branch is used with three delay clock

cycles. Consider the execution of 1000 instructions, 30% of which are

branch instructions, on such a machine in two cases. The first case is the

use of a novice compiler that is not able to reduce the extra clock cycles

wasted due to branch instructions. In the second case, a smart compiler that

is able to utilize 85% of the extra clock cycles is used. Compute the average

number of instructions per cycle in each case. Compute also the percentage of

performance gain due to the use of the smart compiler.

Search WWH ::

Custom Search

Home