Java Reference
In-Depth Information
English locale (en_US) is distinct from the British English locale (en_GB), and the
French spoken in Canada (fr_CA) is different from the French spoken in France
(fr_FR). Occasionally, the scope of a locale is further narrowed with the addition
of a system-dependent variant string.
The Locale class maintains a static default locale, which can be set and queried
with Locale.setDefault() and Locale.getDefault() . Locale-sensitive methods in
Java typically come in two forms. One uses the default locale, and the other uses a
Locale object that is explicitly specified as an argument. A program can create and
use any number of nondefault Locale objects, although it is more common simply
to rely on the default locale, which is inherited from the underlying default locale
on the native platform. Locale-sensitive classes in Java often provide a method to
query the list of locales that they support.
Finally, note that AWT and Swing GUI components (see Chapter 10, Graphical
User Interfaces ) have a locale property, so it is possible for different components
to use different locales. (Most components, however, are not locale-sensitive; they
behave the same in any locale.)
Unicode
Java uses the Unicode character encoding. (Java 1.3 uses Unicode Version 2.1.
Support for Unicode 3.0 will be included in Java 1.4 or another future release.)
Unicode is a 16-bit character encoding established by the Unicode Consortium,
which describes the standard as follows (see http://unicode.or g ):
The Unicode Standard defines codes for characters used in the major lan-
guages written today. Scripts include the European alphabetic scripts,
Middle Eastern right-to-left scripts, and scripts of Asia. The Unicode Stan-
dard also includes punctuation marks, diacritics, mathematical symbols,
technical symbols, arrows, dingbats, etc. ... In all, the Unicode Standard
provides codes for 49,194 characters from the world's alphabets, ideo-
graph sets, and symbol collections.
In the canonical form of the Unicode encoding, which is what Java char and
String types use, every character occupies two bytes. The Unicode characters
\u0020 to \u007E are equivalent to the ASCII and ISO8859-1 (Latin-1) characters
0x20 through 0x7E . The Unicode characters \u00A0 to \u00FF are identical to the
ISO8859-1 characters 0xA0 to 0xFF . Thus, there is a trivial mapping between
Latin-1 and Unicode characters. A number of other portions of the Unicode encod-
ing are based on preexisting standards, such as ISO8859-5 (Cyrillic) and ISO8859-8
(Hebrew), though the mappings between these standards and Unicode may not be
as trivial as the Latin-1 mapping.
Note that Unicode support may be limited on many platforms. One of the difficul-
ties with the use of Unicode is the poor availability of fonts to display all the Uni-
code characters. Figure 7-1 shows some of the characters that are available in the
standard fonts that ship with Sun's Java 1.3 SDK for Linux. (Note that these fonts
do not ship with the Java JRE, so even if they are available on your development
platform, they may not be available on your target platform.) Note the special box
glyph that indicates undefined characters.
Search WWH ::




Custom Search