Biomedical Engineering Reference
In-Depth Information
H
[
X
|
Y
]
H
[
X
,
Y
] -
H
[
Y
].
[43]
H
[
X
|
Y
] is the average uncertainty remaining in
X
, given a knowledge of
Y
.
The
mutual information
I
[
X
;
Y
] between
X
and
Y
is
I
[
X
;
Y
]
H
[
X
] -
H
[
X
|
Y
].
[44]
It gives the reduction in
X
's uncertainty due to knowledge of
Y
and is symmetric
in
X
and
Y
. We can also define higher-order mutual informations, such as the
third-order information
I
[
X
;
Y
;
Z
],
I
[
X
;
Y
;
Z
]
H
[
X
] +
H
[
Y
] +
H
[
Z
] -
H
[
X
,
Y
,
Z
],
[45]
and so on for higher orders. These functions reflect the joint dependence among
the variables.
Mutual information is a special case of the
relative entropy
, also called the
Kullback-Leibler divergence
(or
distance
). Given two
distributions
(not vari-
ables), P and Q, the entropy of Q relative to P is
P( )
x
w
D
(P || Q)
P x
( ) log
Q( )
.
[46]
x
x
D
measures how far apart the two distributions are, since
D
(P||Q) 0, and
D
(P||Q) = 0 implies the two distributions are equal almost everywhere. The di-
vergence can be interpreted either in terms of codes (see below), or in terms of
statistical tests (159). Roughly speaking, given
n
samples drawn from the distri-
bution P, the probability of our accepting the false hypothesis that the distribu-
tion is Q can go down no faster than
2
-nD(P||Q)
. The mutual information
I
[
X
;
Y
] is
the divergence between the joint distribution Pr(
X
,
Y
), and the product of the
marginal distributions, Pr(
X
)Pr(
Y
), and so measures the departure from inde-
pendence.
Some extra information-theoretic quantities ma
k
e sense for time series and
stochastic processes. Supposing we have a process
X
= ...,
X
-2
,
X
-1
,
X
0
,
X
1
,
X
2
,..., we
can define its
mutual information function
by analogy with the autocovariance
function (see ยง3.2),
X
Ist
(,)
=
IXX
[
;
]
,
[47]
s
t
I
()
U
=
I X
[ ;
X
U
]
,
[48]
t
t
+
X