Introducing Java - Beginning Java

Java Reference

In-Depth Information

• The source file name must match the class name exactly. The slightest difference results in an er-

ror. The file must have the extension .java .

After you have compiled the program successfully, you can execute it with the following command:

java -ea OurFirstProgram

The -ea option is not strictly necessary because this program does not use assertions, but if you get used

to putting it in, you won't forget it when it is necessary. If you need the -classpath option specified, use

the following:

java -ea -classpath . OurFirstProgram

Assuming the source file compiled correctly, and the jdk1.7.0_n\bin directory is defined in your

path, the most common reason for the program failing to execute is a typographical error in the class

name, OurFirstProgram . The second most common reason is writing the file name with its extension,

OurFirstProgram.class , in the command. It should be just the class name, OurFirstProgram.

When you run the program, it displays the text

Krakatoa, EAST of Java??

JAVA AND UNICODE

Programming to support languages that use anything other than the Latin character set has been a major

problem historically. There are a variety of 8-bit character sets defined for many national languages, but if

you want to combine the Latin character set and Cyrillic in the same context, for example, things can get

difficult. If you want to handle Japanese as well, it becomes impossible with an 8-bit character set because

with 8 bits you have only 256 different codes, so there just aren't enough character codes to go round.

Unicode is a standard character set that was developed to allow the characters necessary for almost all

languages to be encoded. It uses a 16-bit code to represent a character (so each character occupies 2 bytes),

and with 16 bits up to 65,535 non-zero character codes can be distinguished. With so many character codes

available, there is enough to allocate each major national character set its own set of codes, including char-

acter sets such as Kanji, which is used for Japanese and requires thousands of character codes. It doesn't end

there though. Unicode supports three encoding forms that allow up to a million additional characters to be

represented.

As you see in Chapter 2, Java source code is in Unicode characters. Comments, identifiers (names in oth-

er words — see Chapter 2), and character and string literals can all use any characters in the Unicode set that

represent letters. Java also supports Unicode internally to represent characters and strings, so the framework

is there for a comprehensive international language capability in a program. The normal ASCII set that you

are probably familiar with corresponds to the first 128 characters of the Unicode set. Apart from being aware

that each character usually occupies 2 bytes, you can ignore the fact that you are handling Unicode charac-

ters in the main, unless of course you are building an application that supports multiple languages from the

outset.

I say each Unicode character usually occupies 2 bytes because Java supports Unicode 4.0, which allows

32-bit characters called surrogates . You might think the set of 64K characters that you can represent with 16

bits would be sufficient, but it isn't. Eastern languages such as Japanese, Korean, and Chinese alone involve

Search WWH ::

Custom Search

Home