MATHEMATICAL FUNCTIONS, CHARACTERS, AND STRINGS - Introduction to Java Programming: Comprehensive Version - page 125

Java Reference

In-Depth Information

4.3 Character Data Type and Operations

A character data type represents a single character.

Key

Point

char type

In addition to processing numeric values, you can process characters in Java. The character

data type, char , is used to represent a single character. A character literal is enclosed in single

quotation marks. Consider the following code:

char letter = 'A' ;

char numChar = '4' ;

The first statement assigns character A to the char variable letter . The second statement

assigns digit character 4 to the char variable numChar .

Caution

A string literal must be enclosed in quotation marks ( " " ). A character literal is a single

character enclosed in single quotation marks ( ' ' ). Therefore, "A" is a string, but 'A'

is a character.

char literal

4.3.1 Unicode and ASCII code

Computers use binary numbers internally. A character is stored in a computer as a sequence

of 0s and 1s. Mapping a character to its binary representation is called encoding . There are

different ways to encode a character. How characters are encoded is defined by an encoding

scheme .

Java supports Unicode , an encoding scheme established by the Unicode Consortium to

support the interchange, processing, and display of written texts in the world's diverse lan-

guages. Unicode was originally designed as a 16-bit character encoding. The primitive data

type char was intended to take advantage of this design by providing a simple data type

that could hold any character. However, it turned out that the 65,536 characters possible in

a 16-bit encoding are not sufficient to represent all the characters in the world. The Unicode

standard therefore has been extended to allow up to 1,112,064 characters. Those characters

that go beyond the original 16-bit limit are called supplementary characters . Java supports

the supplementary characters. The processing and representing of supplementary characters

are beyond the scope of this topic. For simplicity, this topic considers only the original 16-bit

Unicode characters. These characters can be stored in a char type variable.

A 16-bit Unicode takes two bytes, preceded by \u , expressed in four hexadecimal digits that

run from \u0000 to \uFFFF . Hexadecimal numbers are introduced in Appendix F, Number

Systems. For example, the English word welcome is translated into Chinese using two char-

acters, . The Unicodes of these two characters are \u6B22\u8FCE . The Unicodes for the

Greek letters abg are \u03b1 \u03b2 \u03b4 .

Most computers use ASCII ( American Standard Code for Information Interchange ), an

8-bit encoding scheme for representing all uppercase and lowercase letters, digits, punctuation

marks, and control characters. Unicode includes ASCII code, with \u0000 to \u007F cor-

responding to the 128 ASCII characters. Table 4.4 shows the ASCII code for some commonly

used characters. Appendix B, 'The ASCII Character Set,' gives a complete list of ASCII

characters and their decimal and hexadecimal codes.

encoding

Unicode

original Unicode

supplementary Unicode

T ABLE 4.4

ASCII Code for Commonly Used Characters

Characters

Code Value in Decimal

Unicode Value

'0' to '9'

48 to 57

\u0030 to \u0039

'A' to 'Z'

65 to 90

\u0041 to \u005A

'a' to 'z'

97 to 122

\u0061 to \u007A

Next Page

Introduction to Java Programming: Comprehensive Version

Search WWH ::

Custom Search

Home