Basics of Bayesian Inference - Electromagnetic Brain Imaging

Biomedical Engineering Reference

In-Depth Information

is the entropy of the posterior distribution. Since the entropy does not depend on

F [

(

), ʸ ]

ʘ( ʸ )

, maximizing

with respect to

is equivalent to maximizing

with respect to

. Namely, the maximization of the free energy with respect to the

hyperparameters results in the M step of the EM algorithm.

Note also that the free energy after the maximization with respect to q

(

)

is equal

to the marginal likelihood, log p

(

| ʸ )

. This can be seen by rewriting Eq. ( B.60 ) such

that

d x p

F [

(

), ʸ ]=

(

) [

log p

(

| ʸ ) −

log p

(

) ]

d x p

(

)

log p

(

| ʸ ) =

log p

(

| ʸ ).

(B.63)

This relationship is used to derive the expressions of the marginal likelihood for the

Bayesian factor analysis in Chap. 5 .

For an arbitrary probability distribution q

, the relationship between the free

energy and the marginal likelihood is expressed as

(

)

| ʸ ) − K L q

) ,

F [

(

), ʸ ]=

log p

(

) ||

(

(B.64)

K L q

) is the Kullback-Leibler (KL) distance defined in

where

(

) ||

(

K L q

) =

(

)

(

) ||

(

)

log

d x

(B.65)

(

)

The KL distance represents a distance between the true posterior distribution p

(

)

and the arbitrary probability distribution q

. It always has a nonnegative value,

and is equal to zero when the two distributions are identical. Hence, for an arbitrary

(

)

(

)

, the inequality

F [

(

), ʸ ]≤

log p

(

| ʸ )

holds, and the free energy forms a

lower-bound of the marginal likelihood.

B.6.2

Variational Bayesian EM Algorithm

In Bayesian inference, to estimate the unknown parameter x , we must first derive

the posterior distribution p

(

)

, assuming the existence of an appropriate prior dis-

tribution p

. We then obtain an optimum estimate of the unknown x based on

the posterior distribution p

(

| ʸ )

is unknown, a truely

Bayesian approach is to first derive the joint posterior distribution p

(

)

. When the hyperparameter

(

, ʸ |

)

, and to

estimate x and

simultaneously based on this joint posterior distribution. To derive

the joint posterior p

(

, ʸ |

)

, we can use Bayes' rule, and obtain,

(

, ʸ |

) ∝

(

, ʸ )

(

, ʸ ) =

(

, ʸ )

(

| ʸ )

( ʸ ).

(B.66)

Electromagnetic Brain Imaging

Search WWH ::

Custom Search

Home