Biomedical Engineering Reference
In-Depth Information
the ordinary Shannon entropy in the limit. There are entropy rates corresponding
to all the Rényi entropies, defined just like the ordinary entropy rate. For dy-
namical systems, these are related to the fractal dimensions of the attractor
(162,163).
The Rényi divergences bear the same relation to the Rényi entropies as
the Kullback-Leibler divergence does to the Shannon entropy. The defining
formula is
B
-
1
p
D
(P || Q)
w
log
q
i
,
[54]
--
B
i
B
1
q
®
i
and similarly for the continuous case. Once again,
lim B l D B (P||Q) = D (P||Q).
For all B > 0, D B (P||Q) 0, and is equal to zero if and only if P and Q are the
same. (If B = 0, then a vanishing Rényi divergence only means that the supports
of the two distributions are the same.) The Rényi entropy H B [ X ] is nonincreasing
as B grows, whereas the Rényi divergence D B (P||Q) is nondecreasing.
1
Estimation of Information-Theoretic Quantities . In applications, we will
often want to estimate theoretic quantities, such as the Shannon entropy or the
mutual information, from empirical or simulation data. Restricting our attention,
for the moment, to the case of discrete-valued variables, the empirical distribu-
tion will generally converge on the true distribution, and so the entropy (say) of
the empirical distribution ("sample entropy") will also converge on the true en-
tropy. However, it is not the case that the sample entropy is an unbiased estimate
of the true entropy. The Shannon (and Rényi) entropies are measures of varia-
tion, like the variance, and sampling tends to reduce variation. Just as the sample
variance is a negatively biased estimate of the true variance, sample entropy is a
negatively biased estimate of the true entropy, and so sample mutual information
is a positively biased estimate of true information. Understanding and control-
ling the bias, as well as the sampling fluctuations, can be very important.
Victor (164) has given an elegant method for calculating the bias of the
sample entropy; remarkably, the leading-order term depends only on the alpha-
bet size k and the number of samples N , and is ( k -1)/2 N . Higher-order terms,
however, depend on the true distribution. Recently, Kraskov et al. (165) have
published an adaptive algorithm for estimating mutual information, which has
very good properties in terms of both bias and variance. Finally, the estimation
of entropy rates is a somewhat tricky matter. The best practices are to either use
an algorithm of the type given by (166), or to fit a properly dynamical model.
(For discrete data, variable-length Markov chains, discussed in §3.6.2 above,
generally work very well, and the entropy rate can be calculated from them very
simply.) Another popular approach is to run one's time series through a standard
compression algorithm, such as gzip , dividing the size in bits of the output by
Search WWH ::




Custom Search