Java Reference
In-Depth Information
3.3. Unicode Escapes
A compiler for the Java programming language (“Java compiler”) first recognizes Unicode
escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits
to the UTF-16 code unit (§ 3.1 ) of the indicated hexadecimal value, and passing all oth-
er characters unchanged. Representing supplementary characters requires two consecutive
Unicode escapes. This translation step results in a sequence of Unicode input characters.
UnicodeInputCharacter:
UnicodeEscape
RawInputCharacter
UnicodeEscape:
\ UnicodeMarker HexDigit HexDigit HexDigit HexDigit
UnicodeMarker:
u
UnicodeMarker u
RawInputCharacter:
any Unicode character
HexDigit: one of
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The \ , u , and hexadecimal digits here are all ASCII characters.
In addition to the processing implied by the grammar, for each raw input character that is a
backslash \ , input processing must consider how many other \ characters contiguously pre-
cede it, separating it from a non- \ character or the start of the input stream. If this number
is even, then the \ is eligible to begin a Unicode escape; if the number is odd, then the \ is
not eligible to begin a Unicode escape.
For example, the raw input "\\u2126=\u2126" results in the eleven characters " \ \ u 2 1 2 6
= Ω " ( \u2126 is the Unicode encoding of the character ).
If an eligible \ is not followed by u , then it is treated as a RawInputCharacter and remains
part of the escaped Unicode stream.
If an eligible \ is followed by u , or more than one u , and the last u is not followed by four
hexadecimal digits, then a compile-time error occurs.
The character produced by a Unicode escape does not participate in further Unicode es-
capes.
Search WWH ::




Custom Search