Java Reference
In-Depth Information
Latin-1. This character set consists of a byte value in the range 0 to 255.
Thefirst128valuesarethesameasthoseoftheASCIIset.Theremaining
ones, in the range 128 to 255, are the characters needed to represent
non-English languages, including French, Spanish, Italian, and German
(in Roman script), typesetting symbols, some Greek letters often used in
mathematics, mathematical symbols, copyright and trademark glyphs,
commonfractions,andothers.
On the Web
The program named LatinSet.java, in the Chapter 19 folder at
www.crcpress.com, displays the ISO Latin-1 character set. Because
DOS-based consoles do not support the first 32 character in ISO Latin-1,
thefirstonedisplayedcorrespondstothevalue160.
The third and most comprehensive character set supported by Java is
Unicode. Unicode characters are encoded in 16 bits, which allow values
in the range 0 to 65,535. This is the same range as the Java char primitive
data type. Unicode allows representing the characters of most modern
languages, including Cyrillic, Greek, Arabic, Hebrew, Persian, Chinese,
and Japanese. The first 256 characters of the Unicode character set coin-
cidewiththeISOLatin-1set.
The fact that Unicode characters are encoded in two bytes may create
problems when using stream-based read and write operations. Streams
have traditionally assumed that alphanumeric data consists of single
bytes. In order to read Unicode characters from the stream, the code
reads a first byte, shifts all the bits 8 positions to the left, reads the sec-
ond byte, then ANDs the low8-bits of the second byte to the shifted bits
of the first one. Alternatively, the same results are obtained by multiply-
ing the first byte by 256 and adding the second one. One risk of reading
16-bit data, 8-bits at a time, is that code may lose step and combine the
secondbyteofonecharacterwiththefirstbyteofthenextone.
Java readers and writers are designed for handling any of the sup-
ported character sets. If the host system is set for ASCII or ISO Latin-1,
readers and writers operate one byte at a time. If the system is set for
Unicode, then data is read from the stream two bytes at a time. Further-
more, streams are not intended for character-based data and do not sup-
port string operations. In this chapter we use readers and writers for
performingfile-basedinputandoutput.
Search WWH ::




Custom Search