The Unicode Character Set - Java Software Solutions: Foundations of Program Design

Java Reference

In-Depth Information

C

The Unicode

Character Set

The Java programming language uses the Unicode character set for managing

text. A character set is simply an ordered list of characters, each corresponding to

a particular numeric value. Unicode is an international character set that contains

letters, symbols, and ideograms for languages all over the world. Each character

is represented as a 16-bit unsigned numeric value. Unicode, therefore, can support

over 65,000 unique characters. Only about half of those values have characters

assigned to them at this point. The Unicode character set continues to be refined

as characters from various languages are included.

Many programming languages still use the ASCII character set. ASCII stands

for the American Standard Code for Information Interchange. The 8-bit extended

ASCII set is quite small, so the developers of Java opted to use Unicode in order

to support international users. However, ASCII is essentially a subset of Unicode,

including corresponding numeric values, so programmers used to ASCII should

have no problems with Unicode.

Figure C.1 shows a list of commonly used characters and their Unicode

numeric values. These characters also happen to be ASCII characters. All of the

characters in Figure C.1 are called printable characters because they have a sym-

bolic representation that can be displayed on a monitor or printed by a printer.

Other characters are called nonprintable characters because they have no such

symbolic representation. Note that the space character (numeric value 32) is

considered a printable character, even though no symbol is printed when it is dis-

played. Nonprintable characters are sometimes called control characters because

many of them can be generated by holding down the control key on a keyboard

and pressing another key.

The Unicode characters with numeric values 0 through 31 are nonprintable

characters. Also, the delete character, with numeric value 127, is a nonprintable

673

Search WWH ::

Custom Search

Home