Information Technology Reference
In-Depth Information
Web Consortium requires META data, although in practice this is
omitted from many Web pages.
The second approach that is commonly used to represent the
characters in a wide variety of languages involves using many
more bits per character than the eight used in the ISO character
set standards just described. For example, the Java programming
language utilizes the Unicode Character encoding system , which
uses 16 bits (2 bytes) for each character. Because 16 bits allows
65,536 different combinations of bits, Unicode offers sufficient
options for encoding characters from many alphabets in the same
coding system. For example, Table 22 shows several characters
for nonRoman alphabets and their Unicode equivalents. The
space for each letter now requires 2 bytes—twice the space re
quired for ASCII characters, but this extra space provides many
more alternatives.
Table 2-2 Selected Non-Roman Characters and Their Unicode
Equivalents
Alphabetic Script
Character
Unicode Equivalent
A
English
0041
Russian
042F
Thai
oE09
Cherokee
13EA
Letterlike Symbols
211E
Arrows
21CC
Braille
282F
Chinese/Japanese/Korean
345F
Common
Search WWH ::




Custom Search