Information Technology Reference
In-Depth Information
Shannon's entropy is the center of a theoretical framework where codes and digital
information can be rigorously analyzed and applied to an enormous spectrum of
scientific fields [198, 200].
The notion of
code
is easily defined by a function from
codewords
to data. Code-
words are strings over a prefixed finite alphabet. A code
C
is
proper
if there is only
one codeword for every datum, otherwise the code is
redundant
. Here, we consider
only proper codes. In the case of a proper code, the
digital size
of the information of
a datum, with respect to the code, is given by the length of the codeword associated
to the datum.
Given a code
C
we define
digit
(
C
)
by the following equation, where
|
α
|
denotes
the length of the string
α
:
)=
α
∈
C
|
α
|
digit
(
C
Let us define
digit
for
C
varying in the proper
codes of
n
codewords. The following result can be shown, which again provides a
strict link between logarithms and information:
(
n
)
as the minimum value of
digit
(
C
)
n
log
k
n
≥
digit
(
n
)
≥
n
(
log
k
n
−
2
)
.
6.8.2
Optimal Codes and Compression
Codes are
univocal
when do not exist two different sequences of codewords which
can be concatenated by providing the same string. These codes have important prop-
erties which are essential for providing efficient decoding processes when data are
transmitted along a transmission channel. Kraft norm
||
C
||
of a code
C
is defined by:
||
=
α
∈
C
|
A
|
−|
α
|
||
C
where
A
is the alphabet of the code, and
denotes the cardinality of
A
. Kraft norm
of a univocal code is always equal to or less than 1. A special kind of univocal code
are the
instantaneous
codes, having the property that no pair of their codewords
α
,
β
|
A
|
. Any univocal code
C
can be always
transformed into an instantaneous code
C
such that
can exist such that
α
is a prefix of
β
C
||
||
||
=
||
. Instantaneous
codes can be defined by means of an encoding tree (an example is given n in Fig.
6.7), where codewords correspond to the leaves of the tree (analogous trees can be
constructed for alphabets with more than two symbols).
An
information source
is any device emitting data
d
C
∈
D
with probability
p
(
d
)
,
the
mean length
L
C
of a proper code
C
is given by:
L
C
=
α
∈
C
p
α
|
α
|.
A code
C
is
optimal
, with respect to an information source, if no other code exists
having a shorter mean length with respect to that information source. A very simple
algorithm due to Huffman exist for defining an optimal code with respect to an