Biomedical Engineering Reference
In-Depth Information
is the entropy of the posterior distribution. Since the entropy does not depend on
ʸ
F
[
(
|
),
ʸ
]
ʸ
ʘ(
ʸ
)
, maximizing
p
x
y
with respect to
is equivalent to maximizing
with respect to
. Namely, the maximization of the free energy with respect to the
hyperparameters results in the M step of the EM algorithm.
Note also that the free energy after the maximization with respect to
q
ʸ
(
x
)
is equal
to the marginal likelihood, log
p
(
y
|
ʸ
)
. This can be seen by rewriting Eq. (
B.60
) such
that
d
x
p
F
[
p
(
x
|
y
),
ʸ
]=
(
x
|
y
)
[
log
p
(
x
,
y
|
ʸ
)
−
log
p
(
x
|
y
)
]
d
x
p
=
(
x
|
y
)
log
p
(
y
|
ʸ
)
=
log
p
(
y
|
ʸ
).
(B.63)
This relationship is used to derive the expressions of the marginal likelihood for the
For an arbitrary probability distribution
q
, the relationship between the free
energy and the marginal likelihood is expressed as
(
x
)
|
ʸ
)
−
K
L
q
)
,
F
[
q
(
x
),
ʸ
]=
log
p
(
y
(
x
)
||
p
(
x
|
y
(B.64)
K
L
q
)
is the Kullback-Leibler (KL) distance defined in
where
(
x
)
||
p
(
x
|
y
q
K
L
q
)
=
q
(
x
)
(
x
)
||
p
(
x
|
y
(
x
)
log
d
x
.
(B.65)
p
(
x
|
y
)
The KL distance represents a distance between the true posterior distribution
p
(
x
|
y
)
and the arbitrary probability distribution
q
. It always has a nonnegative value,
and is equal to zero when the two distributions are identical. Hence, for an arbitrary
q
(
x
)
(
x
)
, the inequality
F
[
q
(
x
),
ʸ
]≤
log
p
(
y
|
ʸ
)
holds, and the free energy forms a
lower-bound of the marginal likelihood.
B.6.2
Variational Bayesian EM Algorithm
In Bayesian inference, to estimate the unknown parameter
x
, we must first derive
the posterior distribution
p
(
x
|
y
)
, assuming the existence of an appropriate prior dis-
tribution
p
. We then obtain an optimum estimate of the unknown
x
based on
the posterior distribution
p
(
x
|
ʸ
)
is unknown, a truely
Bayesian approach is to first derive the joint posterior distribution
p
(
x
|
y
)
. When the hyperparameter
ʸ
(
x
,
ʸ
|
y
)
, and to
estimate
x
and
simultaneously based on this joint posterior distribution. To derive
the joint posterior
p
ʸ
(
x
,
ʸ
|
y
)
, we can use Bayes' rule, and obtain,
p
(
x
,
ʸ
|
y
)
∝
p
(
y
|
x
,
ʸ
)
p
(
x
,
ʸ
)
=
p
(
y
|
x
,
ʸ
)
p
(
x
|
ʸ
)
p
(
ʸ
).
(B.66)