Biomedical Engineering Reference
In-Depth Information
is the entropy of the posterior distribution. Since the entropy does not depend on
ʸ
F [
(
|
), ʸ ]
ʸ
ʘ( ʸ )
, maximizing
p
x
y
with respect to
is equivalent to maximizing
with respect to
. Namely, the maximization of the free energy with respect to the
hyperparameters results in the M step of the EM algorithm.
Note also that the free energy after the maximization with respect to q
ʸ
(
x
)
is equal
to the marginal likelihood, log p
(
y
| ʸ )
. This can be seen by rewriting Eq. ( B.60 ) such
that
d x p
F [
p
(
x
|
y
), ʸ ]=
(
x
|
y
) [
log p
(
x
,
y
| ʸ )
log p
(
x
|
y
) ]
d x p
=
(
x
|
y
)
log p
(
y
| ʸ ) =
log p
(
y
| ʸ ).
(B.63)
This relationship is used to derive the expressions of the marginal likelihood for the
Bayesian factor analysis in Chap. 5 .
For an arbitrary probability distribution q
, the relationship between the free
energy and the marginal likelihood is expressed as
(
x
)
| ʸ ) K L q
) ,
F [
q
(
x
), ʸ ]=
log p
(
y
(
x
) ||
p
(
x
|
y
(B.64)
K L q
) is the Kullback-Leibler (KL) distance defined in
where
(
x
) ||
p
(
x
|
y
q
K L q
) =
q
(
x
)
(
x
) ||
p
(
x
|
y
(
x
)
log
d x
.
(B.65)
p
(
x
|
y
)
The KL distance represents a distance between the true posterior distribution p
(
x
|
y
)
and the arbitrary probability distribution q
. It always has a nonnegative value,
and is equal to zero when the two distributions are identical. Hence, for an arbitrary
q
(
x
)
(
x
)
, the inequality
F [
q
(
x
), ʸ ]≤
log p
(
y
| ʸ )
holds, and the free energy forms a
lower-bound of the marginal likelihood.
B.6.2
Variational Bayesian EM Algorithm
In Bayesian inference, to estimate the unknown parameter x , we must first derive
the posterior distribution p
(
x
|
y
)
, assuming the existence of an appropriate prior dis-
tribution p
. We then obtain an optimum estimate of the unknown x based on
the posterior distribution p
(
x
| ʸ )
is unknown, a truely
Bayesian approach is to first derive the joint posterior distribution p
(
x
|
y
)
. When the hyperparameter
ʸ
(
x
, ʸ |
y
)
, and to
estimate x and
simultaneously based on this joint posterior distribution. To derive
the joint posterior p
ʸ
(
x
, ʸ |
y
)
, we can use Bayes' rule, and obtain,
p
(
x
, ʸ |
y
)
p
(
y
|
x
, ʸ )
p
(
x
, ʸ ) =
p
(
y
|
x
, ʸ )
p
(
x
| ʸ )
p
( ʸ ).
(B.66)
Search WWH ::




Custom Search