How are data represented in a computer (and who cares)? - The Tao of Computing

Information Technology Reference

In-Depth Information

Web Consortium requires META data, although in practice this is

omitted from many Web pages.

The second approach that is commonly used to represent the

characters in a wide variety of languages involves using many

more bits per character than the eight used in the ISO character

set standards just described. For example, the Java programming

language utilizes the Unicode Character encoding system , which

uses 16 bits (2 bytes) for each character. Because 16 bits allows

65,536 different combinations of bits, Unicode offers sufficient

options for encoding characters from many alphabets in the same

coding system. For example, Table 22 shows several characters

for nonRoman alphabets and their Unicode equivalents. The

space for each letter now requires 2 bytes—twice the space re

quired for ASCII characters, but this extra space provides many

more alternatives.

Table 2-2 Selected Non-Roman Characters and Their Unicode

Equivalents

Alphabetic Script

Character

Unicode Equivalent

A

English

0041

Russian

042F

Thai

oE09

Cherokee

13EA

Letterlike Symbols

211E

Arrows

21CC

Braille

282F

Chinese/Japanese/Korean

345F

Common

Search WWH ::

Custom Search

Home