Biology Reference
In-Depth Information
always used interchangeably or synonymously in the information theory literature,
leading to the following general statement:
Shannon entropy of a message source and the information content of a message selected
from it is numerically identical if and only if the channel is noiseless and the number
of messages selected is 1.
(4.5)
Statement 4.5 may be referred to as the “non-identity of information and
Shannon entropy (NISE) thesis.” There are two types of information - algorithmic
and uncertainty-based (Fig. 4.1 ). Algorithmic (also called descriptive) information
is measured by the shortest possible program in some language (e.g., the binary
digital language using 0 0 s and 1 0 s) that is needed to describe the object in the sense
that it can be computed. Thus, algorithmic information is intrinsic to the object
carrying the information. It is quantitated by the number of bits necessary to
characterize the message. Uncertainty-based information is extrinsic to the object
carrying information, since extrinsic information belongs to the property of the set
to which the message belongs rather than to the message itself. Uncertainty-based
(or uncertainty-reducing) information is measured by the amount of the uncertainty
reduced by the reception of a message (see Eq. 4.4 ).
When the probability of occurrence is equal for all of the messages in a message
source, we are dealing with the Hartley information (see Eq. 4.3 ), while, when the
probabilities of occurrences are uneven (i.e., p i 0 sinEq. 4.2 are not the same), we are
dealing with the Shannon information . Consider an object or a message consisting
of a string of ten deoxyribonucleotides:
TGCTTAGCCT
(4.6)
which can be represented as a string of 0 0 s and 1 0 sas
11 01 10 11 11 00 01 10 10 11
(4.7)
by adopting the following code (or convention),
A ¼
00
C
¼
10
G
¼
01
T
¼
11
:
(4.8)
Thus, the algorithmic (also called Kolmogorov-Chaitin) information content
of the ten-nucleotide message in String 4.6 is 20 bits, since the shortest program
that can characterize the message contains 20 binary digits (as evident
in
Expression 4.7 ).
The Hartley information content of the same 10-nucleotide message can be
calculated if we knew the “cardinality” (i.e., the size) of the set out of which the
message was selected. The cardinality of the set involved is 6 10
10 7 ,
¼
6.0466
Search WWH ::




Custom Search