Information Technology Reference
In-Depth Information
than one octet to encode one character. UTF-8 actually allows a sequence of up to
four octets to represent one character which turns out to be quite a complex encoding
mechanism (described in the Unicode standard). UTF-16 contains two octets where
the byte-order is significant. The byte order of text encoded in UTF-16 is usually
indicated by a Byte Order Mark (BOM) at the start of the text. This BOM is the
byte sequence FEFF (hexadecimal notation) when the text is encoded in big-endian
byte-order or FFFE when the text is encoded in little-endian byte-order. FEFF also
represents the “zero-width no-break space” character, i.e. a character that does not
display anything or have any other effect and FFFE is guaranteed not to represent
any character.
One can conclude that a character is a sequence of bits (bit pattern) that can,
when encountered in data, be represented in a more meaningful form such as a
glyph or some other representation such as a decimal value etc. This implies that a
character type could in fact be more formally described by representing the whole
character set as an enumeration. The exact nature of the decoding from code to its
representation is data or even domain specific.
7.3.1.3 Integers
Integers come in a variety of flavours where the number of bits composing the inte-
ger varies or the range of the numbers the integer can represent varies. Typically
there are 8, 16, 32, 64 and 128 or more bits in integer types. In Fig. 7.5 ,the
big-endian 4 octet integer (32 bits) can be read as an unsigned integer with val-
ues ranging from 0 to 4,294,967,295. The exact value of the big-endian integer in
Fig. 7.5 is 2,736,100,710, but if it was read as little-endian without swapping the
octets then the value would read 1,721,046,435, but if swapped first one would still
get the correct value of 2,736,100,710.
Integers can also be signed. Usually the most significant bit is the sign bit (but
can be located elsewhere in the octets), zero for positive and one for negative. The
rest of the bits are used to represent the decimal values of the number.
In Fig. 7.5 the big-endian value as a signed integer is -1,558,866,586. We must
of course state how we calculated the decimal values of the integer. In the above
signed integer example we have actually used two's complement interpretation
of the bits. In two's complement the most significant bit is the sign bit and the
other bits are all inverted (zero goes to one, one goes to zero) and then one is
added, this gives the binary representation that can be read in the normal way.
There are other ways of interpreting integers, such as sign-and-magnitude, one's
complement etc. This method of interpretation is a fundamental property of digital
integers.
Integers then have three properties, the octet (byte) order, the location of the sign
bit and finally the way in which the bits should be interpreted (two's complement
etc). Integers can also be restricted in data value, i.e., they can have a minimum,
maximum (or both) or fixed value. For example, the EISCAT Matlab 4 format [ 31 ]
Search WWH ::




Custom Search