Digital Signal Processing Reference
In-Depth Information
p(e
)
p(e
)
1
1
p(e
)
Q
Q
1
2
Q
X
> 0
X
< 0
e
e
e
-
Q
-
Q
Q
-
Q
2
Q
2
Truncation
Rounding
Magnitude Truncation
Fig. 6
Error distributions for fixed-point arithmetic
1.6.2
Truncation
Quantizing a binary number,
X
, with infinite word length to a number,
X
Q
, with
finite word length yields an error
e
=
X
Q
−
X
.
(6)
Truncation of the binary number is performed by removing the bits with index
i
>
Q
2
12
and the mean value is
2
σ
=
−
Q
/
2where
Q
refer to the weight of the
last bit position.
1.6.3
Rounding
Rounding is, in practice, performed by adding 2
−
(
W
f
+
1
)
to the non-quantized num-
ber before truncation. Hence, the quantized number is the nearest approximation
to the original number. However, if the word length of
X
is
W
f
+
1, the quantized
number should, in principle, be rounded upwards if the last bit is 1 and downwards
if it is 0, in order to make the mean error zero. This special case is often neglected
The variance is
Q
2
12
and the mean value is zero.
2
σ
=
1.6.4
Magnitude Truncation
Magnitude truncation quantizes the number so that
|
X
Q
|≤|
|.
X
(7)
≤
≥
Hence,
e
is
0if
X
is negative. This operation can be
performed by adding 2
−
(
W
f
+
1
)
before truncation if
X
is negative and 0 otherwise.
That is, in two's complement representation adding the sign bit to the last position.
analysis of magnitude truncation becomes very complicated since the error and sign
0if
X
is positive and