Information Technology Reference
In-Depth Information
impossibility of universal compression methods implies that we can only try to dis-
cover compression methods which work for specified classes of texts.
In compression theory there are three main principles: i) encoding more probable
codewords in a shorter way, ii) trying to determine a dictionary of the codewords
occurring in a given text, and iii) changing the order of codewords in such a way
that similar codewords become contiguous. In fact, contiguity of similar codewords
allow for replacing a subtext
n
α
by the pair
( α ,
n
)
, which can be encoded more
n .
shortly than
α
6.8.3
Typicality and Transmission
Shannon's second theorem concerns with the transmission error. It shows that, even
if transmission is affected by an error probability, when data are encoded in a suit-
able way, this probability can be made indefinitely small.
After the theoretical result of the second theorem, transmission codes were in-
troduced where the reliability of transmission is obtained by adding extra symbols
which can recover the lost information when some noise corrupts the invoiced mes-
sages. These extra symbols are called control digits. For this reason, an encoding
with m digits plus k control digits encodes, in the binary case, 2 m + k messages which
send only 2 m
k is the transmission rate .The
capacity of a channel transmitting between a sender information source and a re-
ceiver information source is the maximum amount of mutual information (see later)
which can pass between the two sources (the transmitter and the receiver). In more
precise terms, the second theorem claims that when the transmission rate is less than
the capacity of the channel, then the more we reduce the transmission rate, the more
the error probability decreases.
The second theorem is a masterpiece of Shannon's approach and was the be-
ginning of the theory of the self-correcting transmission codes. Its proof relies on
the notion of typical sequence generated by an information source. When an in-
formation source emits a sequence of data, the longer is the sequence, the more
it approximates to a typical sequence of the source. In fact, a sequence is typical
when the frequency of any datum occurring in it is the same as the emission prob-
ability of the source. This means that sufficiently long sequences can assumed to
be typical, with a small error probability. The deep connection between typicality
and entropy is given by the following asymptotic results: i) the probability of any
typical sequence of length n approximates, for increasing n ,to2 Hn ,where H is the
entropy of the information source, ii) the number of typical sequences of length n
approximates, for increasing n ,to2 Hn .
The theoretical framework necessary to the proof of the second theorem includes
many important notions related to the entropy. In this context, it is useful to consider
an information source as a casual variable, that is, a variable with an associated
probability of assuming their values. If H
different messages. The ratio m
/
m
+
(
X
)
is the entropy of X :
)= X = x p ( x ) log p ( x )
H
(
X
Search WWH ::




Custom Search