Security and Cryptography - Java Examples in a Nutshell

Java Reference

In-Depth Information

English locale (en_US) is distinct from the British English locale (en_GB), and the

French spoken in Canada (fr_CA) is different from the French spoken in France

(fr_FR). Occasionally, the scope of a locale is further narrowed with the addition

of a system-dependent variant string.

The Locale class maintains a static default locale, which can be set and queried

with Locale.setDefault() and Locale.getDefault() . Locale-sensitive methods in

Java typically come in two forms. One uses the default locale, and the other uses a

Locale object that is explicitly specified as an argument. A program can create and

use any number of nondefault Locale objects, although it is more common simply

to rely on the default locale, which is inherited from the underlying default locale

on the native platform. Locale-sensitive classes in Java often provide a method to

query the list of locales that they support.

Finally, note that AWT and Swing GUI components (see Chapter 10, Graphical

User Interfaces ) have a locale property, so it is possible for different components

to use different locales. (Most components, however, are not locale-sensitive; they

behave the same in any locale.)

Unicode

Java uses the Unicode character encoding. (Java 1.3 uses Unicode Version 2.1.

Support for Unicode 3.0 will be included in Java 1.4 or another future release.)

Unicode is a 16-bit character encoding established by the Unicode Consortium,

which describes the standard as follows (see http://unicode.or g ):

The Unicode Standard defines codes for characters used in the major lan-

guages written today. Scripts include the European alphabetic scripts,

Middle Eastern right-to-left scripts, and scripts of Asia. The Unicode Stan-

dard also includes punctuation marks, diacritics, mathematical symbols,

technical symbols, arrows, dingbats, etc. ... In all, the Unicode Standard

provides codes for 49,194 characters from the world's alphabets, ideo-

graph sets, and symbol collections.

In the canonical form of the Unicode encoding, which is what Java char and

String types use, every character occupies two bytes. The Unicode characters

\u0020 to \u007E are equivalent to the ASCII and ISO8859-1 (Latin-1) characters

0x20 through 0x7E . The Unicode characters \u00A0 to \u00FF are identical to the

ISO8859-1 characters 0xA0 to 0xFF . Thus, there is a trivial mapping between

Latin-1 and Unicode characters. A number of other portions of the Unicode encod-

ing are based on preexisting standards, such as ISO8859-5 (Cyrillic) and ISO8859-8

(Hebrew), though the mappings between these standards and Unicode may not be

as trivial as the Latin-1 mapping.

Note that Unicode support may be limited on many platforms. One of the difficul-

ties with the use of Unicode is the poor availability of fonts to display all the Uni-

code characters. Figure 7-1 shows some of the characters that are available in the

standard fonts that ship with Sun's Java 1.3 SDK for Linux. (Note that these fonts

do not ship with the Java JRE, so even if they are available on your development

platform, they may not be available on your target platform.) Note the special box

glyph that indicates undefined characters.

Java Examples in a Nutshell

Search WWH ::

Custom Search

Home