Java Reference
In-Depth Information
You can use escape sequences (see
Advanced Topic 4.4
) inside character
constants. For example,
Ò\nÓ
is the newline character, and
Ò\u00E9Ó
is the
character È. You can find the values of the character constants that are used in
Western European languages in Appendix B.
Characters have numeric values. For example, if you look at Appendix B, you can
see that the character
ÒHÓ
is actually encoded as the number 72.
When Java was first designed, each Unicode character was encoded as a two-byte
quantity. The
char
type was intended to hold the code of a Unicode character.
However, as of 2003, Unicode had grown so large that some characters needed to
be encoded as pairs of
char
values. Thus, you can no longer think of a char value
as a character. Technically speaking, a char value is a code unit in the UTF-16
encoding of Unicode. That encoding represents the most common characters as a
single char value, and less common or supplementary characters as a pair of char
values.
The
charAt
method of the
String
class returns a code unit from a string. As
with the
sub-string
method, the positions in the string are counted starting at
0
. For example, the statement
String greeting = "Hello";
char ch = greeting.charAt(0);
sets ch
to the value
ÒHÓ
.
However, if you use
char
variables, your programs may fail with some strings
that contain international or symbolic characters. For example, the single character
ѭ
(the mathematical symbol for the set of integers) is encoded by the two code
units
Ò\uD835Ó
and
Ò\uDD6BÓ
.
If you call
charAt(0)
on the string containing the single character
ѭ
(that is, the
string
Ð\uD835\uDD6BÑ
), you only get the first half of a supplementary
character.
Therefore, you should only use char values if you are absolutely sure that you
won't need to encode supplementary characters.
162