Conditional Execution - Building Java Programs: A Back to Basics Approach

Java Reference

In-Depth Information

Did You Know?

ASCII and Unicode

We store data on a computer as binary numbers (sequences of 0s and 1s). To store

textual data, we need an encoding scheme that will tell us what sequence of 0s and

1s to use for any given character. Think of it as a giant secret decoder ring that says

things like, “If you want to store a lowercase 'a,' use the sequence 01100001.”

In the early 1960s IBM developed an encoding scheme called EBCDIC that

worked well with the company's punched cards, which had been in use for

decades before computers were even invented. But it soon became clear that

EBCDIC wasn't a convenient encoding scheme for computer programmers.

There were gaps in the sequence that made characters like 'i' and 'j' appear

far apart even though they follow one directly after the other.

In 1967 the American Standards Association published a scheme known as ASCII

(pronounced “AS-kee”) that has been in common use ever since. The acronym is

short for “American Standard Code for Information Interchange.” In its original form,

ASCII defined 128 characters that each could be stored with 7 bits of data.

The biggest problem with ASCII is that it is an American code. There are many

characters in common use in other countries that were not included in ASCII. For

example, the British pound (£) and the Spanish variant of the letter n (ñ) are not

included in the standard 128 ASCII characters. Various attempts have been made

to extend ASCII, doubling it to 256 characters so that it can include many of these

special characters. However, it turns out that even 256 characters is simply not

enough to capture the incredible diversity of human communication.

Around the time that Java was created, a consortium of software professionals

introduced a new standard for encoding characters known as Unicode. They decided

that the 7 bits of standard ASCII and the 8 bits of extended ASCII were simply not

big enough and chose not to set a limit on how many bits they might use for encod-

ing characters. At the time of this writing, the consortium has identified over 100,000

characters, which require a little over 16 bits to store. Unicode includes the characters

used in most modern languages and even some ancient languages. Egyptian hiero-

glyphs were added in 2007, although it still does not include Mayan hieroglyphs, and

the consortium has rejected a proposal to include Klingon characters.

The designers of Java used Unicode as the standard for the type char , which

means that Java programs are capable of manipulating a full range of characters.

Fortunately, the Unicode Consortium decided to incorporate the ASCII encod-

ings, so ASCII can be seen as a subset of Unicode. If you are curious about the

actual ordering of characters in ASCII, type “ASCII table” into your favorite

search engine and you will find millions of hits to explore.

Search WWH ::

Custom Search

Home