Information Technology Reference
In-Depth Information
Carry-save addition can be used to realize a pipelined multiplication building
block. Consider, for example, the multiplication of two n-bit operands A and B.
The multiplication operation can be transformed into an addition as shown in
Figure 9.21. The figure illustrates the case of multiplying two 8-bit operands A
and B. A carry-save based multiplication scheme using the principle shown in
Figure 9.21 is shown in Figure 9.22. The scheme is based on the idea of producing
the set of partial products needed and then adding them up using a carry-save
addition scheme.
9.6. SUMMARY
In this chapter, we have considered the basic principles involved in designing pipe-
line architectures. Our coverage started with a discussion on a number of metrics
that can be used to assess the goodness of a pipeline. We then moved to present a
general discussion on the main problems that need to be considered in designing
a pipelined architecture. In particular we considered two main problems: instruction
and data dependency. The effect of these two problems on the performance of
a pipeline has been elaborated. Some possible techniques that can be used to
reduce the effect of the instruction and data dependency have been introduced
and illustrated. Two examples of recent pipeline architectures, the ARM 11 micro-
architecture, and the UltraSPARC III Processor, have been presented. Our discus-
sion in the chapter ended up with an introduction of some of the ideas that can be
used in realizing pipeline arithmetic architectures.
EXERCISES
1. Consider the execution of 500 instructions on a five-stage pipeline machine.
Compute the speed-up due to the use of pipelining given that the probability
of an instruction being a branch is p ΒΌ
0.3? What must be the value of p and
the expected number of branch instructions such that a speed-up of at least 4 is
possible? What must be the value of p such that a speed-up of at least 5 is poss-
ible? Assume that each stage takes one cycle to perform its task.
2. Assume that a RISC machine executes one instruction per clock cycle if no
branches are executed. Delayed branch is used with three delay clock
cycles. Consider the execution of 1000 instructions, 30% of which are
branch instructions, on such a machine in two cases. The first case is the
use of a novice compiler that is not able to reduce the extra clock cycles
wasted due to branch instructions. In the second case, a smart compiler that
is able to utilize 85% of the extra clock cycles is used. Compute the average
number of instructions per cycle in each case. Compute also the percentage of
performance gain due to the use of the smart compiler.
Search WWH ::




Custom Search