Java Reference
In-Depth Information
1
A
3
A
2
B
2
B
3
C
1
C
Coded Character Set-I
Coded Character Set-II
Figure A-1. Two coded character sets having the same character repertoire and code points
To define a coded character set, you need to specify three things:
A set of code points
A set of characters
The mapping between the set of code points and the set of characters
The number of bits used to represent a character determines how many distinct characters can be represented in
a coded character set. Some widely used coded character sets are as follows.
ASCII
ASCII, the American Standard Code for Information Interchange, is a 7-bit coded character set. ASCII has 2 7 (=128)
code points and so it represents 128 distinct characters whose numeric values range from 0 (binary 0000000) to
127 (binary 1111111). The characters NUL and DELETE are represented by code points 0000000 and 1111111,
respectively. There are historical reasons to assign these code points to NUL and DELETE characters. It was common
to use punched paper tapes to store data for processing by the time ASCII was developed. A 1 bit was used to
represent a hole on the paper tape whereas a 0 bit represented the absence of a hole. Since a row of seven 0 bits
would be indistinguishable from blank tape, the coding 0000000 would have to represent a NUL character, that is, the
absence of any effect. Since holes, once punched, could not be erased but an erroneous character could always be
converted into 111111, this bit pattern was adopted as DELETE character.
ASCII uses the first 32 bit combinations (or code points) to represent control characters. This range includes the
NUL character, but not the DELETE character. Therefore, it leaves 95 bit combinations for printing characters.
128 - 32(Control Characters) -1(DELETE) = 95
All printing characters are arranged in the order that could be used for sorting purposes. The SPACE character
is normally sorted before any other printing character. Therefore, the SPACE character is allocated the first position
among the printing characters. The code point for SPACE character in ASCII is 32, or 1100000. The code point
range of 48 to 57 represents 0 to 9 digits, 65 to 90 represents 26 uppercase letters A to Z, and 97 to122 represents 26
lowercase letters a to z. Modern computers use an 8-bit combination, also known as a byte, as the smallest unit for
storage. Therefore, on modern computers, a 7-bit ASCII character uses 8 bits (or 1 byte) of memory of which the most
significant bit is always set to 0; for example, SPACE is stored as 01100000, DELETE is stored as 0111111. Table A-1
contains the list of characters in the ASCII character set.
 
 
Search WWH ::




Custom Search