Java Reference
In-Depth Information
• The source file name must match the class name exactly. The slightest difference results in an er-
ror. The file must have the extension .java .
After you have compiled the program successfully, you can execute it with the following command:
java -ea OurFirstProgram
The -ea option is not strictly necessary because this program does not use assertions, but if you get used
to putting it in, you won't forget it when it is necessary. If you need the -classpath option specified, use
the following:
java -ea -classpath . OurFirstProgram
Assuming the source file compiled correctly, and the jdk1.7.0_n\bin directory is defined in your
path, the most common reason for the program failing to execute is a typographical error in the class
name, OurFirstProgram . The second most common reason is writing the file name with its extension,
OurFirstProgram.class , in the command. It should be just the class name, OurFirstProgram.
When you run the program, it displays the text
Krakatoa, EAST of Java??
JAVA AND UNICODE
Programming to support languages that use anything other than the Latin character set has been a major
problem historically. There are a variety of 8-bit character sets defined for many national languages, but if
you want to combine the Latin character set and Cyrillic in the same context, for example, things can get
difficult. If you want to handle Japanese as well, it becomes impossible with an 8-bit character set because
with 8 bits you have only 256 different codes, so there just aren't enough character codes to go round.
Unicode is a standard character set that was developed to allow the characters necessary for almost all
languages to be encoded. It uses a 16-bit code to represent a character (so each character occupies 2 bytes),
and with 16 bits up to 65,535 non-zero character codes can be distinguished. With so many character codes
available, there is enough to allocate each major national character set its own set of codes, including char-
acter sets such as Kanji, which is used for Japanese and requires thousands of character codes. It doesn't end
there though. Unicode supports three encoding forms that allow up to a million additional characters to be
represented.
As you see in Chapter 2, Java source code is in Unicode characters. Comments, identifiers (names in oth-
er words — see Chapter 2), and character and string literals can all use any characters in the Unicode set that
represent letters. Java also supports Unicode internally to represent characters and strings, so the framework
is there for a comprehensive international language capability in a program. The normal ASCII set that you
are probably familiar with corresponds to the first 128 characters of the Unicode set. Apart from being aware
that each character usually occupies 2 bytes, you can ignore the fact that you are handling Unicode charac-
ters in the main, unless of course you are building an application that supports multiple languages from the
outset.
I say each Unicode character usually occupies 2 bytes because Java supports Unicode 4.0, which allows
32-bit characters called surrogates . You might think the set of 64K characters that you can represent with 16
bits would be sufficient, but it isn't. Eastern languages such as Japanese, Korean, and Chinese alone involve
Search WWH ::




Custom Search