Biomedical Engineering Reference
In-Depth Information
6.3.4 Variational Bayesian Approximation
From the perspective of a Bayesian purist however, the pursuit of MAP estimates
for unknown quantities of interest, whether parameters s or hyperparameters
, can
be misleading since these estimates discount uncertainty and may not reflect regions
of significant probability mass, unlike (for example) the posterior mean. Variational
Bayesian methods, which have successfully been applied to a wide variety of hier-
archical Bayesian models in the machine learning literature offer an alternative to
s - MAP and
ʳ
ʳ
- MAP . Therefore, a third possibility involves finding formal approxi-
mations to p
(
s
|
y
)
as well as the marginal p
(
y
)
using an intermediary approximation
for p
. However, because of the intractable integrations involved in obtaining
either distribution, practical implementation requires additional assumptions lead-
ing to different types of approximation strategies. The principle idea here is that all
unknown quantities should either be marginalized (integrated out) when possible
or approximated with tractable distributions that reflect underlying uncertainty and
have computable posterior moments. Practically, we would like to account for ambi-
guity regarding
(
y
)
ʳ
when estimating p
(
s
|
y
)
, and potentially, we would like a good
approximation for p
for applica-
tion to model selection. The only meaningful difference between VB and
(
y
)
, or a bound on the model evidence log p
(
y
)
- MAP ,
at least in the context of the proposed generative model, involves approximations to
the model evidence log p
ʳ
- MAP giving different estimates.
In this section, we discuss two types of variational approximations germane to the
source localization problem: the mean field approximation (VB-MF), and a fixed-
form, Laplace approximation (VB-LA). It turns out that both are related to
(
y
)
, with VB and
ʳ
ʳ
- MAP
but with important distinctions. A mean-field approximation makes the simplifying
assumption that the joint distribution over unknowns s and
ʳ
and factorizes, mean-
ing p
are chosen to minimize
the Kullback-Leibler divergence between the factorized and full posterior. This is
accomplished via an iterative process akin to EM, effectively using two E-steps (one
for s and one for
(
s
, ʳ |
y
) ≈ˆ
p
(
s
|
y
) ˆ
p
|
y
)
where
p
ˆ
(
s
|
y
)
and
p
ˆ
|
y
)
ʳ
). It also produces a rigorous lower bound on log p
(
y
)
simi-
lar to
- MAP . A second possibility applies a second-order Laplace approximation
to the posterior on the hyperparameters (after marginalizing over the sources s ),
which is then iteratively matched to the true posterior; the result can then be used
to approximate p
ʳ
(
s
|
y
)
and log p
(
y
)
. Both of these VB methods lead to posterior
approximations
p
ˆ
(
s
|
y
) =
p
(
s
|
y
, ʳ = ʳ)
,
ʳ
is equivalently computed via
ʳ
- MAP .
Consequently, VB has the same level of component pruning as
ʳ
- MAP .
6.3.4.1 Mean-Field Approximation
The basic strategy here is to replace intractable posterior distributions with approxi-
mate ones that, while greatly simplified and amenable to simple inference procedures,
still retain important characteristics of the full model. In the context of our presumed
model structure, both the posterior source distribution p
(
s
|
y
)
, which is maximized
 
Search WWH ::




Custom Search