Biomedical Engineering Reference
In-Depth Information
maximizing the marginal likelihood, we describe an algorithm that maximizes a
quantity called the average data likelihood to obtain estimates for the hyperparame-
ters. This algorithm is called the expectation maximization (EM) algorithm, and is
described in the following.
B.5.2
Average Data Likelihood
The EMalgorithm computes the quantity called the average data likelihood. Comput-
ing the average data likelihood ismuch easier than computing themarginal likelihood.
To define the average data likelihood, let us first define the complete data likelihood,
such that
log p
(
y
,
x
| ʦ , ʛ ) =
log p
(
y
|
x
, ʛ ) +
log p
(
x
| ʦ ).
(B.33)
If we observed not only y but also x , we could have estimated
ʦ
and
ʛ
bymaximizing
log p
with respect to these hyperparameters. However, since we do not
observe x , we must substitute for the unknown x in log p
(
y
,
x
| ʦ , ʛ )
(
y
,
x
| ʦ , ʛ )
with some
“reasonable” value.
Having observed y , we actually know which values of x are reasonable, and our
best knowledge on the unknown x is represented by the posterior distribution p
.
Thus, the “reasonable” value would be the one that maximizes the posterior proba-
bility, and one solution would be to use the MAP estimate of x in log p
(
x
|
y
)
(
,
| ʦ , ʛ )
.A
better solution would be to use all possible values of x in the complete data likelihood
and average over it with the posterior probability. This results in the average data
likelihood,
y
x
ʘ( ʦ , ʛ )
:
p
ʘ( ʦ , ʛ ) =
(
x
|
y
)
log p
(
y
,
x
| ʦ , ʛ )
d x
E log p
| ʦ , ʛ )
=
(
y
,
x
E log p
, ʛ ) +
E log p
| ʦ ) ,
=
(
y
|
x
(
x
(B.34)
where the expectation E
[·]
is taken with respect to the posterior probability p
(
x
|
y
)
.
The estimates of the hyperparameters,
ʦ
and
ʛ
are obtained using
ʛ =
ʘ( ʦ , ʛ ),
argmax
ʛ
(B.35)
ʦ =
argmax
ʦ
ʘ( ʦ , ʛ ).
(B.36)
are expressed
in Eqs. ( B.17 ) and ( B.15 ), respectively. Substituting Eqs. ( B.17 ) and ( B.15 )into
( B.33 ), the complete data likelihood is expressed as 2
In the Gaussian model discussed in Sect. B.3 , p
(
x
| ʦ )
and p
(
y
|
x
, ʛ )
2 The constant terms containing 2 ˀ are ignored here.
Search WWH ::




Custom Search