Biomedical Engineering Reference
In-Depth Information
The Khinchin Axioms and Rényi Information . In 1953, A.I. Khinchin
published a list of four reasonable-looking axioms fora measure of the informa-
tion H [ X ] associated with a random variable X (161). He then proved that the
Shannon information was the unique functional satisfying the axioms, up to an
overall multiplicative constant. (The choice of this constant is equivalent to the
choice of the base for logarithms.) The axioms were as follows.
The information is a functional of the probability distribution of X ,
and not on any of its other properties. In particular, if f is any in-
vertible function, H [ X ] = H [ f ( X )].
The information is maximal for the uniform distribution, where all
events are equally probable.
The information is unchanged by enlarging the probability space
with events of zero probability.
If the probability space is divided into two subspaces, so that X is
split into two variables Y and Z , the total information is equal to the
information content of the marginal distribution of one subspace,
plus the mean information of the conditional distribution of the
other subspace: H [ X ] = H [ Y ] + E [ H ( Z | Y )].
A similar axiomatic treatment can be given for the mutual information and the
relative entropy.
While the first three of Khinchin's axioms are all highly plausible, the fourth
is somewhat awkward. It is intuitively more plausible to merely require that, if Y
and Z are independent, then H [ Y , Z ] = H [ Y ] + H [ Z ]. If the fourth axiom is weak-
ened in this way, however, there is no longer only a single functional satisfying
the axioms. Instead, any of the infinite family of entropies introduced by Rényi
satisfies the axioms. The Rényi entropy of order B, with B any non-negative
real number, is
1
H
[]
X
w
log
p B
[53]
B
i
1
B
ip
:
>
0
i
in the discrete case, and the corresponding integral in the continuous case. The
parameter B can be thought of as gauging how strongly the entropy is biased
towards low-probability events. As B 0, low-probability events count more,
until at B = 0, all possible events receive equal weight. (This is sometimes called
the topological entropy .) As B, only the highest-probability event contrib-
utes to the sum. One can show that, as B 1, H B [ X ] H [ X ], i.e., one recovers
Search WWH ::




Custom Search