img
Recent Additions to Character for Unicode Code Point Support
Recently, major additions have been made to Character. Beginning with JDK 5, the Character
class has included support for 32-bit Unicode characters. In the past, all Unicode characters
could be held by 16 bits, which is the size of a char (and the size of the value encapsulated
within a Character), because those values ranged from 0 to FFFF. However, the Unicode
character set has been expanded, and more than 16 bits are required. Characters can now range
from 0 to 10FFFF.
Here are two important terms: code point and supplemental character. A code point is a
character in the range 0 to 10FFFF. Characters that have values greater than FFFF are
called supplemental characters.
The expansion of the Unicode character set caused a fundamental problem for Java. Because
a supplemental character has a value greater than a char can hold, some means of handling
the supplemental characters was needed. Java addressed this problem two ways. First,
Java uses two chars to represent a supplemental character. The first char is called the high
surrogate, and the second is called the low surrogate. New methods, such as codePointAt( ),
were provided to translate between code points and supplemental characters.
Secondly, Java overloaded several preexisting methods in the Character class. The
overloaded forms use int rather than char data. Because an int is large enough to hold any
character as a single value, it can be used to store any character. For example, all of the methods
in Table 16-7 have overloaded forms that operate on int. Here is a sampling:
static boolean isDigit(int cp)
static boolean isLetter(int cp)
static int toLowerCase(int cp)
In addition to the methods overloaded to accept code points, Character adds methods
that provide additional support for code points. A sampling is shown in Table 16-8.
Method
Description
static int charCount(int cp)
Returns 1 if cp can be represented by a single
char. It returns 2 if two chars are needed.
static int
Returns the code point at the location specified
codePointAt(CharSequence chars, int loc)
by loc.
static int codePointAt(char chars[ ], int loc)
Returns the code point at the location specified
by loc.
static int
Returns the code point at the location that
codePointBefore(CharSequence chars, int loc) precedes that specified by loc.
static int
Returns the code point at the location that
codePointBefore(char chars[ ], int loc)
precedes that specified by loc.
static boolean isHighSurrogate(char ch)
Returns true if ch contains a valid high surrogate
character.
TABLE 16-8
A Sampling of Methods That Provide Suppor t for 32-Bit Unicode Code Points
Search WWH :
Custom Search
Previous Page
Java SE 6 Topic Index
Next Page
Java SE 6 Bookmarks
Home