FLOATING-POINT NUMBERS - Structured Computer Organization

Hardware Reference

In-Depth Information

Bits 1

8

23

Fraction

Exponent

Sign

(a)

Bits 1

11

52

Exponent

Fraction

Sign

(b)

Figure B-4. IEEE floating-point formats. (a) Single precision. (b) Double precision.

precision and excess 1023 for double precision. The minimum (0) and maximum

(255 and 2047) exponents are not used for normalized numbers; they have special

uses described below. Finally, we have the fractions, 23 and 52 bits, respectively.

A normalized fraction begins with a binary point, followed by a 1 bit, and then

the rest of the fraction. Following a practice started on the PDP-11, the authors of

the standard realized that the leading 1 bit in the fraction does not have to be stor-

ed, since it can just be assumed to be present. Consequently, the standard defines

the fraction in a slightly different way than usual. It consists of an implied 1 bit, an

implied binary point, and then either 23 or 52 arbitrary bits. If all 23 or 52 fraction

bits are 0s, the fraction has the numerical value 1.0; if all of them are 1s, the frac-

tion is numerically slightly less than 2.0. To avoid confusion with a conventional

fraction, the combination of the implied 1, the implied binary point, and the 23 or

52 explicit bits is called a significand instead of a fraction or mantissa. All nor-

malized numbers have a significand, s , in the range 1

s <2.

The numerical characteristics of the IEEE floating-point numbers are given in

Fig. B-5. As examples, consider the numbers 0.5, 1, and 1.5 in normalized sin-

gle-precision format. These are represented in hexadecimal as 3F000000,

3F800000, and 3FC00000, respectively.

One of the traditional problems with floating-point numbers is how to deal

with underflow, overflow, and uninitialized numbers. The IEEE standard deals

with these problems explicitly, borrowing its approach in part from the CDC 6600.

In addition to normalized numbers, the standard has four other numerical types,

described below and shown in Fig. B-6.

A problem arises when the result of a calculation has a magnitude smaller than

the smallest normalized floating-point number that can be represented in this sys-

tem. Previously, most hardware took one of two approaches: just set the result to

zero and continue, or cause a floating-point underflow trap. Neither of these is

≤

Structured Computer Organization

Search WWH ::

Custom Search

Home