Java Reference
In-Depth Information
4.2.3. Floating-Point Types, Formats, and Values
The floating-point types are float and double , which are conceptually associated with the
single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations
as specified in IEEE Standard for Binary Floating-Point Arithmetic , ANSI/IEEE Standard
754-1985 (IEEE, New York).
The IEEE 754 standard includes not only positive and negative numbers that consist of a
sign and magnitude, but also positive and negative zeros, positive and negative infinities ,
and special Not-a-Number values (hereafter abbreviated NaN). A NaN value is used to rep-
resent the result of certain invalid operations such as dividing zero by zero. NaN constants
of both float and double type are predefined as Float.NaN and Double.NaN .
Every implementation of the Java programming language is required to support two stand-
ard sets of floating-point values, called the float value set and the double value set . In ad-
dition, an implementation of the Java programming language may support either or both of
two extended-exponent floating-point value sets, called the float-extended-exponent value
set and the double-extended-exponent value set . These extended-exponent value sets may,
under certain circumstances, be used instead of the standard value sets to represent the val-
ues of expressions of type float or double 5.1.13 , § 15.4 ) .
The finite nonzero values of any floating-point value set can all be expressed in the form s
· m · 2 ( e - N + 1) , where s is +1 or -1, m is a positive integer less than 2 N , and e is an integer
between E min = -(2 K -1 -2) and E max = 2 K -1 -1, inclusive, and where N and K are parameters
that depend on the value set. Some values can be represented in this form in more than one
way; for example, supposing that a value v in a value set might be represented in this form
using certain values for s , m , and e , then if it happened that m were even and e were less
than 2 K -1 , one could halve m and increase e by 1 to produce a second representation for the
same value v . A representation in this form is called normalized if m ≥ 2 N -1 ; otherwise the
representation is said to be denormalized . If a value in a value set cannot be represented in
such a way that m ≥ 2 N -1 , then the value is said to be a denormalized value , because it has
no normalized representation.
The constraints on the parameters N and K (and on the derived parameters E min and E max )
for the two required and two optional floating-point value sets are summarized in Table 4.1 .
Search WWH ::




Custom Search