Types, Values, and Variables - The Java Language Specification

Java Reference

In-Depth Information

4.2.3. Floating-Point Types, Formats, and Values

The floating-point types are float and double , which are conceptually associated with the

single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations

as specified in IEEE Standard for Binary Floating-Point Arithmetic , ANSI/IEEE Standard

754-1985 (IEEE, New York).

The IEEE 754 standard includes not only positive and negative numbers that consist of a

sign and magnitude, but also positive and negative zeros, positive and negative infinities ,

and special Not-a-Number values (hereafter abbreviated NaN). A NaN value is used to rep-

resent the result of certain invalid operations such as dividing zero by zero. NaN constants

of both float and double type are predefined as Float.NaN and Double.NaN .

Every implementation of the Java programming language is required to support two stand-

ard sets of floating-point values, called the float value set and the double value set . In ad-

dition, an implementation of the Java programming language may support either or both of

two extended-exponent floating-point value sets, called the float-extended-exponent value

set and the double-extended-exponent value set . These extended-exponent value sets may,

under certain circumstances, be used instead of the standard value sets to represent the val-

ues of expressions of type float or double (§ 5.1.13 , § 15.4 ) .

The finite nonzero values of any floating-point value set can all be expressed in the form s

· m · 2 ( e - N + 1) , where s is +1 or -1, m is a positive integer less than 2 N , and e is an integer

between E min = -(2 K -1 -2) and E max = 2 K -1 -1, inclusive, and where N and K are parameters

that depend on the value set. Some values can be represented in this form in more than one

way; for example, supposing that a value v in a value set might be represented in this form

using certain values for s , m , and e , then if it happened that m were even and e were less

than 2 K -1 , one could halve m and increase e by 1 to produce a second representation for the

same value v . A representation in this form is called normalized if m ≥ 2 N -1 ; otherwise the

representation is said to be denormalized . If a value in a value set cannot be represented in

such a way that m ≥ 2 N -1 , then the value is said to be a denormalized value , because it has

no normalized representation.

The constraints on the parameters N and K (and on the derived parameters E min and E max )

for the two required and two optional floating-point value sets are summarized in Table 4.1 .

Search WWH ::

Custom Search

Home