Java Reference
In-Depth Information
4.3 Character Data Type and Operations
A character data type represents a single character.
Key
Point
char type
In addition to processing numeric values, you can process characters in Java. The character
data type, char , is used to represent a single character. A character literal is enclosed in single
quotation marks. Consider the following code:
char letter = 'A' ;
char numChar = '4' ;
The first statement assigns character A to the char variable letter . The second statement
assigns digit character 4 to the char variable numChar .
Caution
A string literal must be enclosed in quotation marks ( " " ). A character literal is a single
character enclosed in single quotation marks ( ' ' ). Therefore, "A" is a string, but 'A'
is a character.
char literal
4.3.1 Unicode and ASCII code
Computers use binary numbers internally. A character is stored in a computer as a sequence
of 0s and 1s. Mapping a character to its binary representation is called encoding . There are
different ways to encode a character. How characters are encoded is defined by an encoding
scheme .
Java supports Unicode , an encoding scheme established by the Unicode Consortium to
support the interchange, processing, and display of written texts in the world's diverse lan-
guages. Unicode was originally designed as a 16-bit character encoding. The primitive data
type char was intended to take advantage of this design by providing a simple data type
that could hold any character. However, it turned out that the 65,536 characters possible in
a 16-bit encoding are not sufficient to represent all the characters in the world. The Unicode
standard therefore has been extended to allow up to 1,112,064 characters. Those characters
that go beyond the original 16-bit limit are called supplementary characters . Java supports
the supplementary characters. The processing and representing of supplementary characters
are beyond the scope of this topic. For simplicity, this topic considers only the original 16-bit
Unicode characters. These characters can be stored in a char type variable.
A 16-bit Unicode takes two bytes, preceded by \u , expressed in four hexadecimal digits that
run from \u0000 to \uFFFF . Hexadecimal numbers are introduced in Appendix F, Number
Systems. For example, the English word welcome is translated into Chinese using two char-
acters, . The Unicodes of these two characters are \u6B22\u8FCE . The Unicodes for the
Greek letters abg are \u03b1 \u03b2 \u03b4 .
Most computers use ASCII ( American Standard Code for Information Interchange ), an
8-bit encoding scheme for representing all uppercase and lowercase letters, digits, punctuation
marks, and control characters. Unicode includes ASCII code, with \u0000 to \u007F cor-
responding to the 128 ASCII characters. Table 4.4 shows the ASCII code for some commonly
used characters. Appendix  B, 'The ASCII Character Set,' gives a complete list of ASCII
characters and their decimal and hexadecimal codes.
encoding
Unicode
original Unicode
supplementary Unicode
T ABLE 4.4
ASCII Code for Commonly Used Characters
Characters
Code Value in Decimal
Unicode Value
'0' to '9'
48 to 57
\u0030 to \u0039
'A' to 'Z'
65 to 90
\u0041 to \u005A
'a' to 'z'
97 to 122
\u0061 to \u007A
 
 
 
Search WWH ::




Custom Search