Java's built-in support for text processing is more than adequate for the majority of
text processing tasks that business applications normally require. More advanced
tasks, such as the search and processing of very large data sets, or complex parsing
(including formal grammars) are outside the scope of this topic, but Java has a large
ecosystem of helpful libraries and bindings to specialized technologies for text pro‐
cessing and analysis.
Numbers and Math
In this section, we will discuss Java's support for numeric types in some more detail.
In particular, we'll discuss the two's complement representation of integral types that
Java uses. We'll introduce floating-point representations, and touch on some of the
problems they can cause. We'll work through examples that use some of Java's
library functions for standard mathematical operations.
How Java Represents Integer Types
Java's integer types are all signed, as we first mentioned in “Primitive Data Types” on
page 22 . This means that all integer types can represent both positive and negative
numbers. As computers work with binary, this means that the only really logical
way to represent this is to split the possible bit patterns up and use half of them to
represent negative numbers.
Let's work with Java's byte type to investigate how Java represents integers. This has
8 bits, so can represent 256 different numbers (i.e., 128 negative and 128 non-
negative numbers). It's logical to use the pattern 0b0000_0000 to represent zero
(recall that Java has the syntax 0b<binary digits> to represent numbers as binary),
and then it's easy to figure out the bit patterns for the positive numbers:
byte b = 0 b0000_0001 ;
System . out . println ( b ); // 1
b = 0 b0000_0010 ;
System . out . println ( b ); // 2
b = 0 b0000_0011 ;
System . out . println ( b ); // 3
b = 0 b0111_1111 ;
System . out . println ( b ); // 127
When we set the first bit of the byte, the sign should change (as we have now used
up all of the bit patterns that we've set aside for non-negative numbers). So the pat‐
tern 0b1000_0000 should represent some negative number—but which one?