Java Reference
In-Depth Information
You can use escape sequences (see Advanced Topic 4.4 ) inside character
constants. For example, Ò\nÓ is the newline character, and Ò\u00E9Ó is the
character È. You can find the values of the character constants that are used in
Western European languages in Appendix B.
Characters have numeric values. For example, if you look at Appendix B, you can
see that the character ÒHÓ is actually encoded as the number 72.
When Java was first designed, each Unicode character was encoded as a two-byte
quantity. The char type was intended to hold the code of a Unicode character.
However, as of 2003, Unicode had grown so large that some characters needed to
be encoded as pairs of char values. Thus, you can no longer think of a char value
as a character. Technically speaking, a char value is a code unit in the UTF-16
encoding of Unicode. That encoding represents the most common characters as a
single char value, and less common or supplementary characters as a pair of char
values.
The charAt method of the String class returns a code unit from a string. As
with the sub-string method, the positions in the string are counted starting at
0 . For example, the statement
String greeting = "Hello";
char ch = greeting.charAt(0);
sets ch to the value ÒHÓ .
However, if you use char variables, your programs may fail with some strings
that contain international or symbolic characters. For example, the single character
ѭ (the mathematical symbol for the set of integers) is encoded by the two code
units Ò\uD835Ó and Ò\uDD6BÓ .
If you call charAt(0) on the string containing the single character ѭ (that is, the
string Ð\uD835\uDD6BÑ ), you only get the first half of a supplementary
character.
Therefore, you should only use char values if you are absolutely sure that you
won't need to encode supplementary characters.
162
Search WWH ::




Custom Search