Digital Signal Processing Reference
In-Depth Information
3.1.1
Multiplier and ALU
As traditional DSP processors are used for computing numerically intensive tasks
a fast multiplier is an essential unit in a DSP processor. Although multipliers are
included in general-purpose microprocessors, there is a major difference in multi-
pliers in fixed-point DSP processors. In general-purpose fixed-point computations,
integer data type is used and multipliers operate such that integer operands result in
an integer product ( b -bit operands produce b -bit product, i.e., the LSB of product
is saved). In DSP processors, however, fractional data type is exploited, which
implies that the LSB of the product is not sufficient, thus the product is obtained
at full precision; multiplication of b -bit operands results in 2 b -bit product (law of
conservation of bits). This indicates that the integer multipliers present in standard
microprocessors and microcontrollers is not well suited to signal processing. In
some DSP processors, multipliers may produce narrower results for obtaining
speed-up or smaller silicon area. Multiplier may also be pipelined implying latency,
i.e., the product may not be available for the next instruction.
Multiplication is also involved in one of the most characteristic operations
in DSP, multiply-accumulate and often DSP performance even characterized as
MAC/s. This measure is often practical as DSP algorithms are, in general, data-
independent and, therefore, deterministic in behavior. DSP processor may contain
an additional adder to be used in MAC operation and these resources form a MAC
unit. Processor may also contain parallel units to further boost the performance on
DSP applications.
In similar fashion as in general-purpose processors, DSP processors contain
arithmetic-logical unit, which performs the basic operations: addition, subtraction,
increment, negate, and, or, not, etc. Often the addition in MAC operation is carried
out in ALU. The ALU in DSP processor operates in a similar fashion as in general-
purpose processors but the arithmetic operations are carried out with extended word
width operands. This is due to the fact the consecutive MAC operations tend to
increase the word width of the result. In order to avoid overflow in accumulation,
additional guard bits are used, i.e., additional bits are used when performing the
accumulation and storing the accumulation results. In general, log 2 (
additional
bits are required for carrying out N additions without overflow. The more guard
bits, the more headroom against the overflow.
In early DSP processors, multiplier was a separate unit, which stored its result
in a specific “product” register as seen in Fig. 1 . Such an arrangement implies that
an additional instruction is needed to perform the accumulation and there is latency
of one instruction in MAC operation. A similar behavior can be seen in Freescale
DSP56300 illustrated in Fig. 2 .
In order to increase the performance for DSP applications, processor can contain
several multipliers. For example, TI TMS320C55x family of processors contain two
MAC units [ 39 ] . The unit can perform multiplication with 17-bit operands and a 40-
bit addition. TI TMS320C64x processors contain parallel functional units as shown
in Fig. 3 . Two of the eight function units can perform several type of multiplications:
32
N
)
×
32 multiplication, 16
×
16 multiplication, 16
×
32 multiplication, quad 8
×
8
Search WWH ::




Custom Search