Information Technology Reference
In-Depth Information
First the octets are arranged in big-endian format where the most significant octet
is the 0 octet which is read first on big-endian systems. Bit 0 of the 0 octet represents
the decimal integer value 2 31
2,147,483,648 and is the most significant bit. Bit
7 of octet number 3 represents the decimal integer value 2 0
=
1 and is the least
significant (in terms of its contribution to the decimal integer value). With little-
endian the least significant octet is read first and the most significant octet is read
last.
Every hardware computer system manipulates PDTs in one or more of the endian
formats. Reading little-endian data on a system that is big-endian without swap-
ping the octets will give incorrect results for the DVs, and hence its importance
as a fundamental property of the PDTs. Swapping the octets is a simple proce-
dure of reordering the octets, in this case converting from big-endian to little-endian
would involve moving octet 3 to appear first (reading left to right) then octet 2,
octet one and finally octet zero. Note that it is not simply reversing the order of
the bits!
=
7.3.1.2 Characters
Characters are digital representations of the basic symbols in human written lan-
guage. Typically they do not correspond to the glyph of a written character (such
as an alphabetic character) but rather are a code (code point) which can be used
to associate with the corresponding glyph (character encoding) or some other
representation.
One of the most common character encodings is ASCII [ 28 ]. ASCII is repre-
sented as seven bits making 128 possible character encodings. Not all the ASCII
characters are printable; some represent control symbols such as Tab or Carriage
Return which are used for formatting text. ASCII was extended to use octets with
the development of ISO/IEC 8859 giving a wider set (255) character encodings.
ISO/IEC 8859 [ 29 ] is split over 15 parts where the first part is ISO/IEC 8859-
1 is the Latin alphabet no. 1. Each part encodes for a different set of characters
and so a given encoding value (158 say) can correspond to different charac-
ters depending on what part is used. Typically a file containing text encoded
with say ISO/IEC 8859-1 would not be interpreted correctly if decoded with
ISO/IEC 8859-2, even though they are both text files with eight bit characters.
The encoding standard used for a text file is thus very important representation
information.
Recently a new set of standards have been developed to represent character
encodings, these new standards are called Unicode [ 30 ]. Unicode comes with sev-
eral character encodings, for example UTF-8, UTF-16 and UTF-32. UTF-8 is
intended to be backwards compatible with ASCII, in that it needs one octet to encode
the first 128 ASCII characters.
Unicode supports far more characters than just ASCII, it in fact tries to encode
the characters of all languages in common use (Basic Multilingual Plane) and even
historical languages such as Egyptian Hieroglyphs. This means that it requires more
Search WWH ::




Custom Search