Languages and Grammars - Infobiotics: Information in Biotic Systems

Information Technology Reference

In-Depth Information

impossibility of universal compression methods implies that we can only try to dis-

cover compression methods which work for specified classes of texts.

In compression theory there are three main principles: i) encoding more probable

codewords in a shorter way, ii) trying to determine a dictionary of the codewords

occurring in a given text, and iii) changing the order of codewords in such a way

that similar codewords become contiguous. In fact, contiguity of similar codewords

allow for replacing a subtext

n

α

by the pair

( α ,

n

)

, which can be encoded more

n .

shortly than

α

6.8.3

Typicality and Transmission

Shannon's second theorem concerns with the transmission error. It shows that, even

if transmission is affected by an error probability, when data are encoded in a suit-

able way, this probability can be made indefinitely small.

After the theoretical result of the second theorem, transmission codes were in-

troduced where the reliability of transmission is obtained by adding extra symbols

which can recover the lost information when some noise corrupts the invoiced mes-

sages. These extra symbols are called control digits. For this reason, an encoding

with m digits plus k control digits encodes, in the binary case, 2 m + k messages which

send only 2 m

k is the transmission rate .The

capacity of a channel transmitting between a sender information source and a re-

ceiver information source is the maximum amount of mutual information (see later)

which can pass between the two sources (the transmitter and the receiver). In more

precise terms, the second theorem claims that when the transmission rate is less than

the capacity of the channel, then the more we reduce the transmission rate, the more

the error probability decreases.

The second theorem is a masterpiece of Shannon's approach and was the be-

ginning of the theory of the self-correcting transmission codes. Its proof relies on

the notion of typical sequence generated by an information source. When an in-

formation source emits a sequence of data, the longer is the sequence, the more

it approximates to a typical sequence of the source. In fact, a sequence is typical

when the frequency of any datum occurring in it is the same as the emission prob-

ability of the source. This means that sufficiently long sequences can assumed to

be typical, with a small error probability. The deep connection between typicality

and entropy is given by the following asymptotic results: i) the probability of any

typical sequence of length n approximates, for increasing n ,to2 − Hn ,where H is the

entropy of the information source, ii) the number of typical sequences of length n

approximates, for increasing n ,to2 Hn .

The theoretical framework necessary to the proof of the second theorem includes

many important notions related to the entropy. In this context, it is useful to consider

an information source as a casual variable, that is, a variable with an associated

probability of assuming their values. If H

different messages. The ratio m

/

m

+

(

X

)

is the entropy of X :

)= − X = x p ( x ) log p ( x )

H

(

X

Infobiotics: Information in Biotic Systems

Search WWH ::

Custom Search

Home