Java Reference
In-Depth Information
[1]
The Java programming language tracks the Unicode standard. See "
Further Reading
"
on page
755
for reference information. The currently supported Unicode version is listed in the documentation of
the
Character
class.
Few existing text editors support Unicode characters, so you can use
the
escape sequence
\u
xxxx
to encode Unicode characters, where each
x
is a hexadecimal digit (
09
, and
af
or
AF
to represent decimal values
1015). This sequence can appear anywhere in codenot only in character
and string constants but also in identifiers. More than one
u
may appear
at the beginning; thus, the character can be written as
\u0b87
or
(or a subset), you may need to tell your compiler if your source code
contains any character that is not part of the default character encoding
for your systemsuch as through a command-line option that names the
source character set.
[2]
There is a good reason to allow multiple
u's.
When translating a Unicode file into an
ASCII
file, you
must translate Unicode characters that are outside the
ASCII
range into an escape sequence. Thus,
you would translate
into
\u0b87.
When translating back, you make the reverse substitution. But
what if the original Unicode source had not contained but had used
\u0b87
instead? Then the
reverse translation would not result in the original source (to the parser, it would be equivalent, but
possibly not to the reader of the code). The solution is to have the translator add an extra
u
when it
encounters an existing
\u
xxxx
,
and have the reverse translator remove a
u
and, if there aren't any left,
replace the escape sequence with its equivalent Unicode character.
Exercise 7.1
: Just for fun, write a "Hello, World" program entirely using
Unicode escape sequences.
7.1.2. Comments
Comments within source code exist for the convenience of human pro-
grammers. They play no part in the generation of code and so are ig-
nored during scanning. There are three kinds of comments:
//
comment
.
Characters from
//
to the end of the line are ignored