System Design Flow and Fixed-point Arithmetic - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

truncation

0 1 1 1_0 1 1 1 in Q4.4 is 7.4375

rounding

1

0 1 1 1_1 0 0 1

0 1 1 1_1 0 0 = 7.5

Figure 3.11 Rounding followed by truncation

3.5.5.1 Simple Truncation

In multiplication of two Q-format numbers, the number of bits in the product increases. The

precision is sacrificed by dropping some low-precision bits of the product: Qn 1 .m 1 is truncated to

Qn 1 .m 2 , where m 2 < m 1 .

Example:

0111 0111 in Q4

4375

Truncated to Q4 : 2 gives 0111 01 ¼ 7 : 25

:

4is7

:

3.5.5.2 Rounding Followed by Truncation

Direct truncation of numbers biases the results, so inmany applications it is preferred to round before

trimming the number to the desired size. For this, 1 is added to the bit that is at the right of the position

of the point of truncation. The resultant number is then truncated to the desired number of bits. This is

shown in the example in Figure 3.11. First rounding and then truncation gives a better approxima-

tion; in the example, simple truncation toQ4.2 results in a number with value 7.25, whereas rounding

before truncation gives 7.5 - which is closer to the actual value 7.4375.

3.5.6 Overflow and Saturation

Overflow is a serious consequence of fixed-point arithmetic. Overflow occurs if two positive or

negative numbers are added and the sum requires more than the available number of bits. For

example, in a 3-bit two's complement representation, if 1 is added to 3 (

3 0 b011), the sum is 4 (

¼

4 0 b0100). The number 4 thus

requires

four bits

and cannot be

represented as

a

3-bit two's complement signed number as 3 0 b100 (

4). This causes an error equal to the full

dynamic range of the number and so adversely affects subsequent computation that uses this number.

Figure 3.12 shows the case of an overflow for a 3-bit number, adding an error equal to the dynamic

range of the number. It is therefore imperative to check the overflow condition after performing

arithmetic operations that can cause a possible overflow. If an overflow results, the designer should

set an overflow flag. In many circumstances, it is better to curtail the result to the maximum positive

or minimum negative value that the defined word length can represent. In the above example the

value should be limited to 3 0 b011.

Thus, the following computation is in 3-bit precision with an overflow flag set to indicate this

abnormal result:

¼

1

Similarly, performing subtraction with an overflow flag set to ! is:

3

þ

1

¼

3 and overflow flag

¼

4 1 ¼ 4 overflow flag ¼ 1

Digital Design of Signal Processing Systems: A Practical Approach

Search WWH ::

Custom Search

Home