Information Technology Reference
In-Depth Information
Shannon's entropy is the center of a theoretical framework where codes and digital
information can be rigorously analyzed and applied to an enormous spectrum of
scientific fields [198, 200].
The notion of code is easily defined by a function from codewords to data. Code-
words are strings over a prefixed finite alphabet. A code C is proper if there is only
one codeword for every datum, otherwise the code is redundant . Here, we consider
only proper codes. In the case of a proper code, the digital size of the information of
a datum, with respect to the code, is given by the length of the codeword associated
to the datum.
Given a code C we define digit
(
C
)
by the following equation, where
| α |
denotes
the length of the string
α
:
)= α C | α |
digit
(
C
Let us define digit
for C varying in the proper
codes of n codewords. The following result can be shown, which again provides a
strict link between logarithms and information:
(
n
)
as the minimum value of digit
(
C
)
n
log k n
digit
(
n
)
n
(
log k n
2
) .
6.8.2
Optimal Codes and Compression
Codes are univocal when do not exist two different sequences of codewords which
can be concatenated by providing the same string. These codes have important prop-
erties which are essential for providing efficient decoding processes when data are
transmitted along a transmission channel. Kraft norm
||
C
||
of a code C is defined by:
|| = α C | A | −| α |
||
C
where A is the alphabet of the code, and
denotes the cardinality of A . Kraft norm
of a univocal code is always equal to or less than 1. A special kind of univocal code
are the instantaneous codes, having the property that no pair of their codewords
α , β
|
A
|
. Any univocal code C can be always
transformed into an instantaneous code C such that
can exist such that
α
is a prefix of
β
C ||
||
|| = ||
. Instantaneous
codes can be defined by means of an encoding tree (an example is given n in Fig.
6.7), where codewords correspond to the leaves of the tree (analogous trees can be
constructed for alphabets with more than two symbols).
An information source is any device emitting data d
C
D with probability p
(
d
)
,
the mean length L C of a proper code C is given by:
L C = α C p α | α |.
A code C is optimal , with respect to an information source, if no other code exists
having a shorter mean length with respect to that information source. A very simple
algorithm due to Huffman exist for defining an optimal code with respect to an
 
Search WWH ::




Custom Search