Information Technology Reference
In-Depth Information
expression. Hence, we will make use of an approximation technique known as
variational Bayesian inference [119, 19] that provides us with such a closed-form
expression.
Alternatively, sampling techniques, such as Markov Chain Monte Carlo
(MCMC) methods, could be utilised to get an accurate posterior and model
evidence. However, the model structure search is expensive and requires a quick
evaluation of the model evidence for a given model structure, and therefore the
computational burden of sampling techniques makes approximating the model
evidence by variational methods a better choice.
For the remainder of this chapter, all distributions are treated as being im-
plicitly conditional on X and
, to keep the notation simple. Additionally, the
range for sums and products will not always be specified explicitly, as they are
usually obvious from their context.
M
7.3.1 Variational Bayesian Inference
The aim of Bayesian inference and model selection is, on one hand, to find
a variational distribution q ( U ) that approximates the true posterior p ( U
Y )
and, on the other hand, to get the model evidence p ( Y ). Variational Bayesian
inference is based on the decomposition [19, 118]
ln p ( Y )=
|
L
( q )+KL( q
p ) ,
(7.20)
( q )=
q ( U )ln p ( U , Y )
q ( U )
L
d U ,
(7.21)
q ( U )ln p ( U
Y )
q ( U )
|
KL( q
p )=
d U ,
(7.22)
which holds for any choice of q . As the Kullback-Leibler divergence KL( q
p )is
always non-negative, and zero if and only if p ( U
|
Y )= q ( U ) [232], the variational
bound
( q ) is a lower bound on ln p ( Y ) and only equivalent to the latter if
q ( U ) is the true posterior p ( U
L
|
Y ). Hence, the posterior can be approximated
by maximising the lower bound
( q ), which brings the variational distribution
closer to the true posterior and at the same time yields an approximation of the
model evidence by
L
L
( q )
ln p ( Y ).
Factorial Distributions
To make this approach tractable, we need to choose a family of distributions
q ( U ) that gives an analytical solution. A frequently used approach (for example,
[20, 227]) that is suciently flexible to give a good approximation to the true
posterior is to use the set of distributions that factorises with respect to disjoint
groups U i of variables
q ( U )=
i
q i ( U i ) ,
(7.23)
which allows maximising
( q ) with respect to each group of hidden variables
separately while keeping the other ones fixed. This results in
ln q i ( U i )=
L
E i = j (ln p ( U , Y )) + const. ,
(7.24)
Search WWH ::




Custom Search