Graphics Reference
In-Depth Information
some number bases. For example, in base 10, 1/3 is not exactly representable in a fixed
number of digits; unlike, say, 0.1. However, in a binary floating-point representation
0.1 is no longer exactly representable but is instead given by the repeating fraction
(0.0001100110011 ... ) 2 . When this number is normalized and rounded off to 24 bits
(including the one bit that is not stored) the mantissa bit pattern ends in ... 11001101,
where the last least significant bit has been rounded up. The IEEE single-precision
representation of 0.1 is therefore slightly larger than 0.1. As virtually all current CPUs
use binary floating-point systems, the following code is thus extremely likely to print
“Greater than”rather than anything else.
float tenth = 0.1f;
if (tenth * 10.0f > 1.0f)
printf("Greater than\n");
else if (tenth * 10.0f < 1.0f)
printf("Less than\n");
else if (tenth * 10.0f == 1.0f)
printf("Equal\n");
else
printf("Huh?\n");
That some numbers are not exactly representable also means that whereas
replacing x/2.0f with x*0.5f is exact (as both 2.0 and 0.5 are exactly representable)
replacing x/10.0f with x*0.1f is not. As multiplications are generally faster than
divisions (up to about a magnitude, but the gap is closing on current architectures),
the latter replacement is still frequently and deliberately performed for reasons of
efficiency.
It is important to realize that floating-point arithmetic does not obey ordinary
arithmetic rules. For example, round-off errors may cause a small but nonzero value
added to or subtracted from a large value to have no effect. Therefore, mathematically
equivalent expressions can produce completely different results when evaluated using
floating-point arithmetic.
Consider the expression 1.5e3 + 4.5e-6 . In real arithmetic, this expression corre-
sponds to 1500.0
+
0.0000045, which equals 1500.0000045. Because single-precision
floats can hold only about seven decimal digits, the result is truncated to 1500.0 and
digits are lost. Thus, in floating-point arithmetic a+b can equal a even though b is
nonzero and both a and b can be expressed exactly!
A consequence of the presence of truncation errors is that floating-point arith-
metic is not associative. In other words, (a+b)+c is not necessarily the same as
a+(b+c) . Consider the following three values of a , b , and c .
float a = 9876543.0f;
float b = -9876547.0f;
float c = 3.45f;
 
Search WWH ::




Custom Search