Languages and Grammars - Infobiotics: Information in Biotic Systems

Information Technology Reference

In-Depth Information

Shannon's entropy is the center of a theoretical framework where codes and digital

information can be rigorously analyzed and applied to an enormous spectrum of

scientific fields [198, 200].

The notion of code is easily defined by a function from codewords to data. Code-

words are strings over a prefixed finite alphabet. A code C is proper if there is only

one codeword for every datum, otherwise the code is redundant . Here, we consider

only proper codes. In the case of a proper code, the digital size of the information of

a datum, with respect to the code, is given by the length of the codeword associated

to the datum.

Given a code C we define digit

(

)

by the following equation, where

| α |

denotes

the length of the string

)= α ∈ C | α |

digit

(

Let us define digit

for C varying in the proper

codes of n codewords. The following result can be shown, which again provides a

strict link between logarithms and information:

(

)

as the minimum value of digit

(

)

log k n

≥

digit

(

) ≥

(

log k n

−

) .

6.8.2

Optimal Codes and Compression

Codes are univocal when do not exist two different sequences of codewords which

can be concatenated by providing the same string. These codes have important prop-

erties which are essential for providing efficient decoding processes when data are

transmitted along a transmission channel. Kraft norm

of a code C is defined by:

|| = α ∈ C | A | −| α |

where A is the alphabet of the code, and

denotes the cardinality of A . Kraft norm

of a univocal code is always equal to or less than 1. A special kind of univocal code

are the instantaneous codes, having the property that no pair of their codewords

α , β

. Any univocal code C can be always

transformed into an instantaneous code C such that

can exist such that

is a prefix of

C ||

|| = ||

. Instantaneous

codes can be defined by means of an encoding tree (an example is given n in Fig.

6.7), where codewords correspond to the leaves of the tree (analogous trees can be

constructed for alphabets with more than two symbols).

An information source is any device emitting data d

∈

D with probability p

(

)

the mean length L C of a proper code C is given by:

L C = α ∈ C p α | α |.

A code C is optimal , with respect to an information source, if no other code exists

having a shorter mean length with respect to that information source. A very simple

algorithm due to Huffman exist for defining an optimal code with respect to an

Infobiotics: Information in Biotic Systems

Search WWH ::

Custom Search

Home