Java Reference
In-Depth Information
3.3. Unicode Escapes
A compiler for the Java programming language (“Java compiler”) first recognizes Unicode
escapes in its input, translating the ASCII characters
\u
followed by four hexadecimal digits
er characters unchanged. Representing supplementary characters requires two consecutive
Unicode escapes. This translation step results in a sequence of Unicode input characters.
UnicodeInputCharacter:
UnicodeEscape
RawInputCharacter
UnicodeEscape:
\
UnicodeMarker HexDigit HexDigit HexDigit HexDigit
UnicodeMarker:
u
UnicodeMarker
u
RawInputCharacter:
any Unicode character
HexDigit: one of
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The
\
,
u
, and hexadecimal digits here are all ASCII characters.
In addition to the processing implied by the grammar, for each raw input character that is a
backslash
\
, input processing must consider how many other
\
characters contiguously pre-
cede it, separating it from a non-
\
character or the start of the input stream. If this number
is even, then the
\
is eligible to begin a Unicode escape; if the number is odd, then the
\
is
not eligible to begin a Unicode escape.
For example, the raw input
"\\u2126=\u2126"
results in the eleven characters
" \ \ u 2 1 2 6
= Ω "
(
\u2126
is the Unicode encoding of the character
Ω
).
If an eligible
\
is not followed by
u
, then it is treated as a
RawInputCharacter
and remains
part of the escaped Unicode stream.
If an eligible
\
is followed by
u
, or more than one
u
, and the last
u
is not followed by four
hexadecimal digits, then a compile-time error occurs.
The character produced by a Unicode escape does not participate in further Unicode es-
capes.