Biomedical Engineering Reference
In-Depth Information
|
)
with s - MAP , as well as the hyperparameter posterior
y
, which is maximized via
ʳ
- MAP , are quite complex and can only be expressed up to some unknown scal-
ing factor (the integration required for normalization is intractable). Likewise, the
joint posterior p
over all unknowns is likewise intractable and complex. VB
attempts to simplify this situation by finding an approximate joint posterior that
factorizes as
(
s
, ʳ |
y
)
p
(
s
, ʳ |
y
) ≈ˆ
p
(
s
, ʳ |
y
)
p
(
s
|
y
) ˆ
p
|
y
),
(6.29)
where
are amenable to closed-form computation of posterior
quantities such as means and variances (unlike the full posteriors upon which our
model is built). This is possible because the enforced factorization, often called the
mean-field approximation reflecting its origins in statistical physics, simplifies things
significantly. The cost function optimized to find this approximate distribution is
p
ˆ
(
s
|
y
)
and
p
ˆ
|
y
)
p
ˆ
(
s
|
y
), ˆ
p
|
y
) =
argmin
q
KL
[
q
(
s
)
q
(ʳ) ||
p
(
s
, ʳ |
y
) ] ,
(6.30)
(
s
),
q
(ʳ)
(
)
(ʳ)
[ . || . ]
where q
s
and q
are arbitrary probability distributions and KL
indicates the
Kullback-Leibler divergence measure.
Recall that
- MAP iterations effectively compute an approximate distribution for
s (E-step) and then a point estimate for
ʳ
(M-step); s - MAP does the exact opposite.
In contrast, here an approximating distribution is required for both parameters s and
hyperparameters
ʳ
. While it is often convenient that conjugate hyperpriors must be
employed such that Eq. ( 6.30 ) is solvable, in fact this problem can be solved by
coordinate descent over q
ʳ
(
s
)
and q
(ʳ)
for virtually any hyperprior. It can be shown
that
- MAP and mean-field approximations can be equivalent in terms of the cost
function being optimized and the source activity estimates obtained, given certain
choice of hyperpiors [ 10 ]. Nevertheless offer a whole class of algorithms within
this framework with different covariance component sets, and possible hyperpriors
selected, and how the optimization is performed. The main advantage of VB is that
strict lower bounds on log p
ʳ
(
y
)
automatically fall out of the VB framework, given by:
log p
(
y
)
F
log p
(
y
)
KL
[
q
(
s
)
q
(ʳ) ||
p
(
s
, ʳ |
y
) ]
p
(
y
,
s
, ʳ)
=
p
ˆ
(
s
|
y
) ˆ
p
|
y
)
log
d
ʳ ,
(6.31)
p
ˆ
(
s
|
y
) ˆ
p
|
y
)
where the inequality follows by the non-negativity of the Kullback-Leibler diver-
gence. The quantity F is sometimes referred to the variational free energy. Evaluation
of F requires the full distribution
p
ˆ
|
y
)
and therefore necessitates using conjugate
priors or further approximations.
6.3.4.2 Laplace Approximation
The Laplace approximation has been advocated to finding a tractable posterior dis-
tribution on the hyperparameters and then using this
ˆ
|
)
p
y
to find approximations
Search WWH ::




Custom Search