FLOATING-POINT NUMBERS - Structured Computer Organization

Hardware Reference

In-Depth Information

Example 1: Exponentiation to the base 2

2 -2

2 -4

2 -6

2 -8

2 -10

2 -11

2 -12

2 -13

2 -14

2 -15

2 -16

2 -1

2 -3

2 -5

2 -7

2 -9

=2 20 (1

2 -12 +1

2 -13 +1

2 -15

Unnormalized:

111

0000

0000000000110

2 -16 ) = 432

2 -13

+1 × 2 -15 +1 × 2 -16

2 -12 +1

Sign

Excess 64

exponent is

84-64=20

Fraction is 1

To normalize, shift the fraction left 11 bits and subtract 11 from the exponent.

Normalized:

=2 9 (1 × 2 -1 +1 × 2 -2 +1 × 2 -4

+1 × 2 -5 ) = 432

100

0101

1011000000000

Fraction is 1 × 2 -1 +1 × 2 -2

Sign

Excess 64

exponent is

73-64=9

2 -4 +1

2 -5

Example 2: Exponentiation to the base 16

16 -1

16 -2

16 -3

16 -4

=16 5 (1 × 16 -3 +B × 16 -4 ) = 432

Unnormalized: 0101

0001 000

00 0

00 1

10 1

16 -3 +B

16 -4

Sign

Excess 64

exponent is

69-64=5

Fraction is 1

To normalize, shift the fraction left 2 hexadecimal digits, and subtract 2 from the exponent.

=16 3 (1

16 -1 +B

16 -2 ) = 432

Normalized:

100

0011

001

1011

0000

16 -1 +B

16 -2

Sign

Excess 64

exponent is

67-64=3

Fraction is 1

Figure B-3. Examples of normalized floating-point numbers.

To rectify this situation, in the late 1970s IEEE set up a committee to stan-

dardize floating-point arithmetic. The goal was not only to permit floating-point

data to be exchanged among different computers but also to provide hardware de-

signers with a model known to be correct. The resulting work led to IEEE Stan-

dard 754 (IEEE, 1985). Most CPUs these days (including the Intel and JVM ones

studied in this topic) have floating-point instructions that conform to the IEEE

floating-point standard. Unlike many standards, which tend to be wishy-washy

compromises that please no one, this one is not bad, in large part because it was

primarily the work of one person, Berkeley math professor William Kahan. The

standard will be described in the remainder of this section.

The standard defines three formats: single precision (32 bits), double precision

(64 bits), and extended precision (80 bits). The extended-precision format is in-

tended to reduce roundoff errors. It is used primarily inside floating-point arith-

metic units, so we will not discuss it further. Both the single- and double-precision

formats use radix 2 for fractions and excess notation for exponents. The formats

are shown in Fig. B-4.

Both formats start with a sign bit for the number as a whole, 0 being positive

and 1 being negative. Next comes the exponent, using excess 127 for single

Structured Computer Organization

Search WWH ::

Custom Search

Home