Java Reference
In-Depth Information
4.2.3. Floating-Point Types, Formats, and Values
The floating-point types are
float
and
double
, which are conceptually associated with the
single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations
as specified in
IEEE Standard for Binary Floating-Point Arithmetic
, ANSI/IEEE Standard
754-1985 (IEEE, New York).
The IEEE 754 standard includes not only positive and negative numbers that consist of a
sign and magnitude, but also positive and negative zeros, positive and negative
infinities
,
and special
Not-a-Number
values (hereafter abbreviated NaN). A NaN value is used to rep-
resent the result of certain invalid operations such as dividing zero by zero. NaN constants
of both
float
and
double
type are predefined as
Float.NaN
and
Double.NaN
.
Every implementation of the Java programming language is required to support two stand-
ard sets of floating-point values, called the
float value set
and the
double value set
. In ad-
dition, an implementation of the Java programming language may support either or both of
two extended-exponent floating-point value sets, called the
float-extended-exponent value
set
and the
double-extended-exponent value set
. These extended-exponent value sets may,
under certain circumstances, be used instead of the standard value sets to represent the val-
The finite nonzero values of any floating-point value set can all be expressed in the form
s
·
m
· 2
(
e
-
N
+ 1)
, where
s
is +1 or -1,
m
is a positive integer less than 2
N
, and
e
is an integer
between
E
min
= -(2
K
-1
-2) and
E
max
= 2
K
-1
-1, inclusive, and where
N
and
K
are parameters
that depend on the value set. Some values can be represented in this form in more than one
way; for example, supposing that a value
v
in a value set might be represented in this form
using certain values for
s
,
m
, and
e
, then if it happened that
m
were even and
e
were less
than 2
K
-1
, one could halve
m
and increase
e
by 1 to produce a second representation for the
same value
v
. A representation in this form is called
normalized
if
m
≥ 2
N
-1
; otherwise the
representation is said to be
denormalized
. If a value in a value set cannot be represented in
such a way that
m
≥ 2
N
-1
, then the value is said to be a
denormalized value
, because it has
no normalized representation.
The constraints on the parameters
N
and
K
(and on the derived parameters
E
min
and
E
max
)